On my todo list for awhile has been adding localization support to Terminix. Unfortunately at this time D does not have an official i18n package nor are there any libraries supporting the GNU gettext which has become a De facto standard for localization over the years. While there are some localization libraries available for D, I really wanted to stick with gettext given the huge eco-system it has in terms of support tooling and websites.
While examining my options, I stumbled on the fact that the GTK GLib library supports gettext and that GtkD has already wrapped them in the glib.Internalization class. After refreshing my knowledge of how gettext works I was able to put together a basic working localization system in about an evenings worth of work. The remainder of the post will describe the steps involved with getting it working.
I knew in Terminix that I would need to support localization, so from day one I had a module gx.i18n.l10n that contained a single function following the gnu gettext pattern as follows:
string _(string text) {
return text;
}
Whenever something in Terminix would eventually need to be localized, I would simply wrap it in a call to this function. Obviously at this early stage the function does nothing beyond returning the original text. An example of its usage is as follows:
string msg = _("This is a localized string");
For those of you familiar with the usage of gettext in C or other languages this will all seem familiar. For anyone else, in gettext any text you want to localize is wrapped in this function and becomes the key for retrieving a localized version. If no localized version is available, i.e. for a language for which a translation does not yet exist, the key is used as the text that is displayed. in my opinion, this system is much superior to localization systems that require programmers to have to embed artificial keys in the code and is one reason why gettext is so popular.
The next step was to simply incorporate the glib.Internationalization class from GtkD, so I updated my _() method as follows:
string _(string text) {
return Internationalization.dgettext(_textdomain, text);
}
The textdomain parameter above is used by gettext to differentiate this application from others and will be discussed in more detail later. At this point, the application now supports localization but the real work begins as we need to prepare all of the localization materials that gettext requires. In gettext there are three types of files required for localization:
- Template. This file, with a .pot extension, is used as the template for the localization. For a given application there is typically only one template file.
- Portable Object (po). These files, with the extension .po, contain the localization for a specific locale, for example en.po or fr.po. These files are in the same format as the template file.
- Machine Object (mo). These are the binary files that are used at runtime and are created by the msgfmt utility. These files have a 1:1 mapping with a po file in that each mo file is created from a po file.
Here’s an extract showing what a template/po file looks like:
msgid "Match entire word only"
msgstr "Match entire word only"
msgid "IBeam"
msgstr "IBeam"
msgid "Run command as a login shell"
msgstr "Run command as a login shell"
msgid "Exit the terminal"
msgstr "Exit the terminal"
in the extract above, the msgid is the key while the msgstr is the actual localization and as per above in the template file these will be identical. Additionally for most applications there is no need to provide a localization file for the locale the developer is using since that locale is already embedded in the source code by default.
The challenge at this point was creating the template file, while the gettext program has a utility called xgettext that can extract all of the localized strings from source code, unfortunately D is not one of the languages supported. I thought about creating a version of xgettext using the excellent libdparse, however I opted for a quick and dirty method as Terminix doesn’t have a large amount of strings needing localization.
What I ended up doing is adding some code to my l10n module to capture all the localization requests and then write it out a file when the application terminates. This has the advantage of being relatively easy to do but the disadvantage that you have to exercise the app pretty completely to capture all the strings. For Terminix there were only a few areas I couldn’t exercise easily and for those I simply updated the template file after the fact. Below is the code I used to generate the template file, note the use of the Version specification so this code only gets included in a specific build configuration. I’ve removed some of the comments for the sake of conciseness, you can view the original code in github.
module gx.i18n.l10n;
import glib.Internationalization;
version (Localize) {
import std.experimental.logger;
import std.file;
import std.string;
string[string] messages;
void saveFile(string filename) {
string output;
foreach(key,value; messages) {
if (key.indexOf("%") >= 0) {
output ~= "#, c-format\n";
}
if (key.indexOf("\n") >= 0) {
string lines;
foreach(s;key.splitLines()) {
lines ~= "\"" ~ s ~ "\"\n";
}
output ~= ("msgid \"\"\n" ~ lines);
output ~= ("msgstr \"\"\n" ~ lines ~ "\n");
} else {
output ~= "msgid \"" ~ key ~ "\"\n";
output ~= "msgstr \"" ~ key ~ "\"\n\n";
}
}
write(filename, output);
}
}
void textdomain(string domain) {
_textdomain = domain;
}
string _(string text) {
version (Localize) {
trace("Capturing key " ~ text);
messages[text] = text;
}
return Internationalization.dgettext(_textdomain, text);
}
private:
string _textdomain;
When capturing text, there are a couple of special cases in gettext to be aware of. The first is that the xgettext utility puts a special comment in front of strings that use C style formatting, i.e. %d or %s. I don’t think this comment is used but I wanted to keep it. The second is that the key for multi-line strings is generated with each line separated. That’s why in the code above you see the check for newline and the splitLines call.
Once the template file is completed we are ready to create our first localization. In my case, I created an English localization as Terminix has some programmatic terms (shortcut identifiers) that were intended to be localized to human friendly language rather then shown directly to the user. Creating the en.po file is just a matter of copying the terminix.pot file to en.po. While gettext has a utility for this, msginit, I just opted to copy it for simplicity.
Once the en.po localization was completed it needs to be compiled into a mo file. In Linux, mo files are stored in usr/share/locale/${LOCALE}/LC_MESSAGES where ${LOCALE} is the standard language/country code. The mo files for the application are named after the textdomain for the application and this is how gettext locates the right file for the application. For example, in Terminix’s case the full path to the English mo file would be usr/share/locale/en/LC_MESSAGES/terminix.mo.
To compile the mo file, simply use the gettext msgfmt utility as follows:
sudo msgfmt en.po -o /usr/share/locale/en/LC_MESSAGES/terminix.mo
Obviously you would want to script the above process as part of creating an installation package, you can see how Terminix does this here.