If you want a simple and quick run-down of the string types in Windows and MSVC++, how they work, what they all mean, and how to program with them, I’ve put together the following snippet of code with explanations on the types of strings that MSVC++ exposes to developers and what their type-names mean, as well as a brief introduction to encoding sets and their effect on this mysterious but wonderful game of Windows string types:
#include "stdafx.h"
#include "Windows.h"
int _tmain(int argc, _TCHAR* argv[])
{
/* Quick Tutorial on Strings in Microsoft Visual C++
The Unicode Character Set and Multibyte Character Set options in MSVC++ provide a project with two flavours of string encodings. They will use different encodings for characters in your project. Here are the two main character types in MSVC++ that you should be concerned about:
1. char <-- char characters use an 8-bit character encoding (8 bits = 1 byte) according to MSDN.
2. wchar_t <-- wchar_t uses a 16-bit character encoding (16 bits = 2 bytes) according to MSDN.
From above, we can see that the size of each character in our strings will change depending on our chosen character set.
WARNING: Do NOT assume that any given character you append to either a Mutlibyte or Unicode string will always take up a single-byte or double-byte space defined by char or wchar_t! That is up to the discretion of the encoding used. Sometimes, characters need to be combined to define a character that the user wants in their string. In other words, take this example: Multibyte character strings take up a byte per character inside of the string, but that does not mean that a given byte will always produce the character you desire at a particular location, because even multibyte characters may take up more than a single byte. MSDN says it may take up TWO character spaces to produce a single multibyte-encoded character: "A multibyte-character string may contain a mixture of single-byte and double-byte characters. A two-byte multibyte character has a lead byte and a trail byte."
WARNING: Do NOT assume that Unicode contains every character for every language. For more information, please see http://stackoverflow.com/questions/5290182/how-many-bytes-takes-one-unicode-character.
Note: The ASCII Character Set is a subset of both Multibyte and Unicode Character Sets (in other words, both of these flavours encompass ASCII characters).
Note: You should always use Unicode for new development, according to MSDN. For more information, please see http://msdn.microsoft.com/en-us/library/ey142t48.aspx.
*/
// Strings that are Multibyte.
LPSTR a; // Regular Multibyte string (synonymous with char *).
LPCSTR b; // Constant Multibyte string (synonymous with const char *).
// Strings that are Unicode.
LPWSTR c; // Regular Unicode string (synonymous with wchar_t *).
LPCWSTR d; // Constant Unicode string (synonymous with const wchar_t *).
// Strings that take on either Multibyte or Unicode depending on project settings.
LPTSTR e; // Multibyte or Unicode string (can be either char * or wchar_t *).
LPCTSTR f; // Constant Multibyte or Unicode string (can be either const char * or const wchar_t *).
/* From above, it is safe to assume that the pattern is as follows:
LP: Specifies a long pointer type (this is synonymous with prefixing this type with a *).
W: Specifies that the type is of the Unicode Character Set.
C: Specifies that the type is constant.
T: Specifies that the type has a variable encoding.
STR: Specifies that the type is a string type.
*/
// String format specifiers:
e = _T("Example."); // Formats a string as either Multibyte or Unicode depending on project settings.
e = TEXT("Example."); // Formats a string as either Multibyte or Unicode depending on project settings (same as _T).
c = L"Example."; // Formats a string as Unicode.
a = "Example."; // Formats a string as Multibyte.
return 0;
}
