3.2 Unicode Storage Encoding Forms

UTF-8: Universal Transformation Format, 8 bits
A 1-6 bytes per character scheme
0xxxxxxx
110xxxxx10xxxxxx
1110xxxx10xxxxxx10xxxxxx
11110xxx10xxxxxx10xxxxxx10xxxxxx
111110xx10xxxxxx10xxxxxx10xxxxxx10xxxxxx
1111110x 10xxxxxx10xxxxxx10xxxxxx10xxxxxx10xxxxxx
UTF-16
0079 007A D800 DC00big edian
7900 7A00 00D8 00DClittle edian
FEFF 0079 007A D800 DC00big edian with BOM
FFFF 7900 7A00 00D8 00DClittle edian with BOM
UTF-32
A variant of UTF-16 uniformly employing 32 bits per character
Universal Character Sets: UCS-2, USC-4
A sequence of 2 or 4 bytes per character, holding the character code point.