Quick Tutorial on String Types in Windows and Microsoft Visual C++ (MSVC++)

If you want a simple and quick run-down of the string types in Windows and MSVC++, how they work, what they all mean, and how to program with them, I’ve put together the following snippet of code with explanations on the types of strings that MSVC++ exposes to developers and what their type-names mean, as well as a brief introduction to encoding sets and their effect on this mysterious but wonderful game of Windows string types:

#include "stdafx.h"
#include "Windows.h"

int _tmain(int argc, _TCHAR* argv[])
{
	/* Quick Tutorial on Strings in Microsoft Visual C++

	   The Unicode Character Set and Multibyte Character Set options in MSVC++ provide a project with two flavours of string encodings. They will use different encodings for characters in your project. Here are the two main character types in MSVC++ that you should be concerned about:

	   1. char <-- char characters use an 8-bit character encoding (8 bits = 1 byte) according to MSDN.
	   2. wchar_t <-- wchar_t uses a 16-bit character encoding (16 bits = 2 bytes) according to MSDN.

	   From above, we can see that the size of each character in our strings will change depending on our chosen character set.
	   
	   WARNING: Do NOT assume that any given character you append to either a Mutlibyte or Unicode string will always take up a single-byte or double-byte space defined by char or wchar_t! That is up to the discretion of the encoding used. Sometimes, characters need to be combined to define a character that the user wants in their string. In other words, take this example: Multibyte character strings take up a byte per character inside of the string, but that does not mean that a given byte will always produce the character you desire at a particular location, because even multibyte characters may take up more than a single byte. MSDN says it may take up TWO character spaces to produce a single multibyte-encoded character: "A multibyte-character string may contain a mixture of single-byte and double-byte characters. A two-byte multibyte character has a lead byte and a trail byte."

	   WARNING: Do NOT assume that Unicode contains every character for every language. For more information, please see http://stackoverflow.com/questions/5290182/how-many-bytes-takes-one-unicode-character.

	   Note: The ASCII Character Set is a subset of both Multibyte and Unicode Character Sets (in other words, both of these flavours encompass ASCII characters).
	   Note: You should always use Unicode for new development, according to MSDN. For more information, please see http://msdn.microsoft.com/en-us/library/ey142t48.aspx.
	*/
	// Strings that are Multibyte.
	LPSTR a; // Regular Multibyte string (synonymous with char *).
	LPCSTR b; // Constant Multibyte string (synonymous with const char *).
	// Strings that are Unicode.
	LPWSTR c; // Regular Unicode string (synonymous with wchar_t *).
	LPCWSTR d; // Constant Unicode string (synonymous with const wchar_t *).
	// Strings that take on either Multibyte or Unicode depending on project settings.
	LPTSTR e; // Multibyte or Unicode string (can be either char * or wchar_t *).
	LPCTSTR f; // Constant Multibyte or Unicode string (can be either const char * or const wchar_t *).
	/* From above, it is safe to assume that the pattern is as follows:

	   LP: Specifies a long pointer type (this is synonymous with prefixing this type with a *).
	   W: Specifies that the type is of the Unicode Character Set.
	   C: Specifies that the type is constant.
	   T: Specifies that the type has a variable encoding.
	   STR: Specifies that the type is a string type.
	*/
	// String format specifiers:
	e = _T("Example."); // Formats a string as either Multibyte or Unicode depending on project settings.
	e = TEXT("Example."); // Formats a string as either Multibyte or Unicode depending on project settings (same as _T).
	c = L"Example."; // Formats a string as Unicode.
	a = "Example."; // Formats a string as Multibyte.
	return 0;
}

Alexandru

"To avoid criticism, say nothing, do nothing, be nothing." - Aristotle

"It is wise to direct your anger towards problems - not people; to focus your energies on answers - not excuses." - William Arthur Ward

"Science does not know its debt to imagination." - Ralph Waldo Emerson

"Money was never a big motivation for me, except as a way to keep score. The real excitement is playing the game." - Donald Trump

"All our dreams can come true, if we have the courage to pursue them." - Walt Disney

"Mitch flashes back to a basketball game held in the Brandeis University gymnasium in 1979. The team is doing well and chants, 'We're number one!' Morrie stands and shouts, 'What's wrong with being number two?' The students fall silent." - Tuesdays with Morrie

I'm not entirely sure what makes me successful in general programming or development, but to any newcomers to this blood-sport, my best guess would be that success in programming comes from some strange combination of interest, persistence, patience, instincts (for example, someone might tell you that something can't be done, or that it can't be done a certain way, but you just know that can't be true, or you look at a piece of code and know something doesn't seem right with it at first glance, but you can't quite put your finger on it until you think it through some more), fearlessness of tinkering, and an ability to take advice because you should be humble. Its okay to be wrong or to have a bad approach, realize it, and try to find a better one, and even better to be wrong and find a better approach to solve something than to have had a bad approach to begin with. I hope that whatever fragments of information I sprinkle across here help those who hit the same roadblocks.

Leave a Reply

Your email address will not be published. Required fields are marked *