If you're like me, and you're coming from a C++ background, then it's easy to think of a character the same as a byte. If you had a string, then you had an array of bytes. If you needed to save that string to a file, you could just open up a file and write each character as a byte (assuming you're working with the ASCII character set). However, this posed a problem once you were outside the realm of ASCII. Suddenly you had to deal with character sets and encodings and, gasp, Unicode.
When I started working with C#, I needed to change the way I thought of text, characters and files.
In C#:
- strings are an object
- strings are comprised of characters
- characters are Unicode, not bytes
This actually posed a problem for me when I first tried to write a simple string to a file.
My first instinct was to open a System.IO.Stream object and try to write the string. However, it's Write() function only takes a byte array. So how do I get a byte array from a string? String has a ToCharArray() function, but nothing for bytes. You could attempt something like:
char c = 'M';
byte b = Convert.ToByte(c);
To attempt a conversion and maybe this will work. However, (a) it's going through hoops to do the conversion, and (b) it won't work for anything outside the ASCII character set.
Then I found the System.Text.Encoding namespace. This namespace specializes in conversions between bytes and characters. Since I knew my characters were in ASCII, could now use:
string s = "This is my string";
byte[] buffer = Encoding.ASCII.GetBytes(s);
Now I can write my bytes to the file. Note that I am "using" the System.IO and System.Text namespaces at the top of my C# file.
string s = "This is my string";
byte[] buffer = Encoding.ASCII.GetBytes(s);
Stream stream = new FileStream("Stream.txt",
FileMode.Create);
stream.Write(buffer, 0, buffer.Length);
stream.Close();
That's a lot of work just to write characters to a file. In addition, the file is technically a binary file, not a text file.
There is an easier way: System.IO.TextWriter.
This class is an easier way to write text data to a file. You still need to concern yourself with the encoding, however, you can let the writer do the conversion for you. To replace the code above, you write:
string s = "This is my string";
TextWriter writer = new StreamWriter("TextWriter.txt",
Encoding.ASCII);
writer.WriteLine(s);
writer.Close();
This has much less work for the developer.
If you're working with multiple languages, you can use the UTF7, UTF8, or Unicode encoders. These come standard with C#. In addition, you can specify an encoder for a specific codepage. So if you're working with Japanese text, you can write your file as:
string s = "My Japanese string: 私の日本のひも";
TextWriter writer = new StreamWriter("TextWriter.txt",
false, Encoding.GetEncoding(932));
writer.WriteLine(s);
writer.Close();
(I hope you see Japanese characters in the above code snippet).
Reading is also very simple using System.IO.TextReader and System.IO.StreamReader. You can specify a specific codepage to use when reading your text file, or you can let the encoding engine take a "best guess" at it.
TextReader reader = new StreamReader("TextWriter.txt");
s = reader.ReadLine();
reader.Close();
This should be good enough if you write using ASCII, UTF8, or Unicode, but this won't be good enough to detect specific codepage encoding, so for that you'll need to specify the encoder.
TextReader reader = new StreamReader("TextWriter.txt",
Encoding.GetEncoding(932));
s = reader.ReadLine();
reader.Close();
Once you get used to the fact that byte != char, you'll feel much more comfortable working with files. It may seem a bit daunting at first, however, unlike C++, the encoding namespace and the readers and writers make it easier to be explicit in how you're working with strings. You don't need to convert Unicode to and from Multi-byte character sets anymore.