It would be a help for us if someone with native English would bother to style this page
NATSPEC library is intended to smooth national specificities in using of programs, to put it more exactly:
to resolve in most cases the problems with charset;
to provide various aids, which facilitate the programs localization.
Pay attention, that the program is not capable of solving national and interethnic conflicts.
The reasons for the appearance
A Linux user, who really works with this system, runs into charset problems quite often: with either files content, or filenames. Users, except for those speaking Western European languages, are very often enforced to specify the encoding for content or name of the files, they deal with. It occurs at
mounting floppies, CD's and DVD's, flash disks and memory cards, brought hard disks and other kinds of removable media ;
browsing and mounting network resources, available through SAMBA;
burning optical discs (to create file system) with the help of mkisofs, growisofs commands, k3b programs, xcdroast and so forth;
texts reenconding, using feature of the mc (Midnight Commander) file manager.
Moreover, there are huge amount of programs, which usage yells for specifying the charset, but they give no option to do so:
among ftp-servers and ftp-clients;
among nfs-servers and when mounting nfs resources
almost all file systems, used for OS installation;
Rock Ridge file system on optical disks;
many multimedia players, required to choose proper audiotrack and/or subtitles for video, or needing to parse ID3 tags made by dumb mp3 encoding software.
These problems are often left without solution, or are solved by private capacity, or are fixed with patches not entirely correct.
It turns out, that the system is required to provide by some means the answers to following questions:
what kind of charsets is using for file names in the system?
what kind of charsets is using for the file content?
One may often hear the opinion, that those questions have no general answers and essentially can't have, and thus it is neither possible nor required to look for answers to them.
To a great extent, the problem is almost resolved (assuming we do not think about compatibility completely) by total system conversion ещ UTF-8 charset. Unfortunately even in this case, a number of problems would still arise (look UTF 8 Conversion Problems article)
But nobody had repealed the compatibility questions and anyway the interaction method with other operating systems (Windows, DOS, MAC) as well as with Unix-like operating systems must be settled and formulated. Especially it is important for the distribution building.
After (perhaps fluent) reviewing sources of projects like WINE, Linux kernel, gettext, Ly X, GLIBC, GLIB, mount, submount, cdrtools, zip, dia, beep, xmms, the requirements had been formulated and library had been implamented, which provides answers for charsets questions, and also has a set of auxiliary functions, claimed by many projects.
What it is needed for
There are variety of programs, which are tightly working with charsets, but by objective circumstances they go their own way in charsets support. Just only mention mkisofs built-in support for reencoding, which using the nls, extracted from nobody knows what Linux kernel version.
This project was created for solving most questions relate to text reencoding, outside of certain program. It enhances portability and allows certain project not to distract on writing yet another crutches for resolving general system problems.
The library is not a silver bullet or heal-all, it is just a feature to inmprove data interoperability between different systems and to ease users and programmers life during current intermediate phase, when all the progressive mankind had stepped one feet into brave new UTF-8 world (no, this abbreviation doesn't belong to UFO or coffins).
How does it work
LIBNATSPEC library defines such important concepts as
locale file system charset (filename encoding)
user locale charset
other operating systems' (WIN,DOS,MAC) charset and codepage for given locale. //This works correctly only if locales supported by LIBNATSPEC and GLIBC matches)
and provides API for using them, as well as auxiliary functions, which allow:
to complete mounting parameters with charset specification with respect to file system type
to convert text strings from one charset to another, with transliteration if it is necessary to display the text while user's locale does not allow to display all the used symbols.
The design principle: All the information, obtained from library, if possible should depend on current locale.
LIBNATSPEC only suggests default mounting charset. NOBODY REMOVES the possibility to specify a charset manually. Default value just becomes not iso-8859–1, but rather the one matching current locale.
Basis for heuristic is a table, constructed with the following algorithm: By list of installed with glibc locales, main charset information located in locale, and program also, which obtains information about charsets accordance of different OS from WINE, static array is formed from, which used in library further for charset definition by locale and etc.
The library is written with C language aiming at maximally possible portability. At this moment linking needs libc and libpopt libraries. The testing at other systems hasn't took place. It has been tested on different systems such as Free BSD, Sun OSperhaps Solaris?, Mac OS X and a number of popular GNU/Linux distro's.
There are interfaces for other languages:
For using in scripts a console program is provided, which allows to query for parameters, detected by the library. For example:
$ natspec -l would show the system locale
$ natspec -l would show the charset used by filenames.
$ natspec -i output all available information
(Please, try it out and tell us your opinion about accuracy of determining situation at your machine)
example is missed
For ALT Linux and other Red Hat descendants:
There is a file /etc/sysconfig/i18n in system, in which, besides all, the system locale is set by LANG=locale string. For example, in my system, it is specified at this file: LANG=ru_RU.KOI8-R