Most programs treat data as 8-bit strings, without assumption of
encoding. There are two exceptions:
- Labels will be decoded as ISO-8859-1, when used to put strings in
PostScript images.
- The leven program optionally decodes dialect data (not the labels in
the datafiles) as UTF-8.
You can put Unicode strings in datafiles encoded as lists of numbers,
preceded by a plus sign, but this is not easily human readable.
You can also put strings in datafiles as strings encoded in UTF-8. You
need to tell leven about this by
including this line in each datafile effected:
%utf8
This line effects the rest of the datafile, so be sure to put it at the top.
Labels in datafiles will always be interpreted as raw 8-bit strings.
The
features program processes all files as 8-bit
data, but this applies to datafiles as well as feature definition
file. So you can use any character encoding you like, single byte or
multi byte. But you cannot use the UTF-7 encoding, for obvious reasons.
For
xstokens the same applies as for the
features program.