Kitchen.i18n Module

I18N is an important piece of any modern program. Unfortunately, setting up i18n in your program is often a confusing process. The functions provided here aim to make the programming side of that a little easier.

Most projects will be able to do something like this when they startup:

# myprogram/__init__.py:

import os
import sys

from kitchen.i18n import easy_gettext_setup

_, N_  = easy_gettext_setup('myprogram', localedirs=(
        os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
        os.path.join(sys.prefix, 'lib', 'locale')
        ))

Then, in other files that have strings that need translating:

# myprogram/commands.py:

from myprogram import _, N_

def print_usage():
    print _(u"""available commands are:
    --help              Display help
    --version           Display version of this program
    --bake-me-a-cake    as fast as you can
        """)

def print_invitations(age):
    print _('Please come to my party.')
    print N_('I will be turning %(age)s year old',
        'I will be turning %(age)s years old', age) % {'age': age}

See the documentation of easy_gettext_setup() and get_translation_object() for more details.

See also

gettext
for details of how the python gettext facilities work
babel
The babel module for in depth information on gettext, message catalogs, and translating your app. babel provides some nice features for i18n on top of gettext

Functions

easy_gettext_setup() should satisfy the needs of most users. get_translation_object() is designed to ease the way for anyone that needs more control.

kitchen.i18n.easy_gettext_setup(domain, localedirs=(), use_unicode=True)

Setup translation functions for an application

Parameters:
  • domain – Name of the message domain. This should be a unique name that can be used to lookup the message catalog for this app.
  • localedirs – Iterator of directories to look for message catalogs under. The first directory to exist is used regardless of whether messages for this domain are present. If none of the directories exist, fallback on sys.prefix + /share/locale Default: No directories to search so we just use the fallback.
  • use_unicode – If True return the gettext functions for unicode strings else return the functions for byte str for the translations. Default is True.
Returns:

tuple of the gettext function and gettext function for plurals

Setting up gettext can be a little tricky because of lack of documentation. This function will setup gettext using the Class-based API for you. For the simple case, you can use the default arguments and call it like this:

_, N_ = easy_gettext_setup()

This will get you two functions, _() and N_() that you can use to mark strings in your code for translation. _() is used to mark strings that don’t need to worry about plural forms no matter what the value of the variable is. N_() is used to mark strings that do need to have a different form if a variable in the string is plural.

See also

Kitchen.i18n Module
This module’s documentation has examples of using _() and N_()
get_translation_object()
for information on how to use localedirs to get the proper message catalogs both when in development and when installed to FHS compliant directories on Linux.

Note

The gettext functions returned from this function should be superior to the ones returned from gettext. The traits that make them better are described in the DummyTranslations and NewGNUTranslations documentation.

Changed in version kitchen-0.2.4: ; API kitchen.i18n 2.0.0 Changed easy_gettext_setup() to return the lgettext functions instead of gettext functions when use_unicode=False.

kitchen.i18n.get_translation_object(domain, localedirs=(), languages=None)

Get a translation object bound to the message catalogs

Parameters:
  • domain – Name of the message domain. This should be a unique name that can be used to lookup the message catalog for this app.
  • localedirs – Iterator of directories to look for message catalogs under. The first directory to exist is used regardless of whether messages for this domain are present. If none of the directories exist, fallback on sys.prefix + /share/locale Default: No directories to search; just use the fallback.
Returns:

Translation object to get gettext methods from

If you need more flexibility than easy_gettext_setup(), use this function. It sets up a gettext Translation object and returns it to you. Then you can access any of the methods of the object that you need directly. For instance, if you specifically need to access lgettext():

translations = get_translation_object('foo')
translations.lgettext('My Message')

Setting up gettext in a portable manner can be a little tricky due to not having a common directory for translations across operating systems. get_translation_object() is able to handle that if you give it a list of directories to search for catalogs:

translations = get_translation_object('foo', localedirs=(
     os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
     os.path.join(sys.prefix, 'lib', 'locale')))

This will search for several different directories:

  1. A directory named locale in the same directory as the module that called get_translation_object(),
  2. In /usr/lib/locale
  3. In /usr/share/locale (the fallback directory)

This allows gettext to work on Windows and in development (where the message catalogs are typically in the toplevel module directory) and also when installed under Linux (where the message catalogs are installed in /usr/share/locale). You (or the system packager) just need to install the message catalogs in /usr/share/locale and remove the locale directory from the module to make this work. ie:

In development:
    ~/foo   # Toplevel module directory
    ~/foo/__init__.py
    ~/foo/locale    # With message catalogs below here:
    ~/foo/locale/es/LC_MESSAGES/foo.mo

Installed on Linux:
    /usr/lib/python2.7/site-packages/foo
    /usr/lib/python2.7/site-packages/foo/__init__.py
    /usr/share/locale/  # With message catalogs below here:
    /usr/share/locale/es/LC_MESSAGES/foo.mo

Warning

The first directory that we can access will be used regardless of whether locale files for our domain and language are present in the directory. That means you have to consider the order in which you list directories in localedirs. Always list directories which you, the user, or the system packager can control the existence of before system directories that will exist whether or not the message catalogs are present in them.

Note

This function returns either DummyTranslations or NewGNUTranslations. These classes are superior to their gettext equivalents as described in their documentation.

Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 Allow sending the Changed easy_gettext_setup() to return the lgettext functions instead of gettext functions when use_unicode=False.

Translation Objects

The standard translation objects from the gettext module suffer from several problems:

  • They can throw UnicodeError
  • They can’t find translations for non-ASCII byte str messages
  • They may return either unicode string or byte str from the same function even though the functions say they will only return unicode or only return byte str.

DummyTranslations and NewGNUTranslations were written to fix these issues.

class kitchen.i18n.DummyTranslations(fp=None)

Safer version of gettext.NullTranslations

This Translations class doesn’t translate the strings and is intended to be used as a fallback when there were errors setting up a real Translations object. It’s safer than gettext.NullTranslations in its handling of byte str vs unicode strings.

Unlike NullTranslations, this Translation class will never throw a UnicodeError. The code that you have around a call to DummyTranslations might throw a UnicodeError but at least that will be in code you control and can fix. Also, unlike NullTranslations all of this Translation object’s methods guarantee to return byte str except for ugettext() and ungettext() which guarantee to return unicode strings.

When byte str are returned, the strings will be encoded according to this algorithm:

  1. If a fallback has been added, the fallback will be called first. You’ll need to consult the fallback to see whether it performs any encoding changes.
  2. If a byte str was given, the same byte str will be returned.
  3. If a unicode string was given and set_output_charset() has been called then we encode the string using the output_charset
  4. If a unicode string was given and this is gettext() or ngettext() and _charset was set output in that charset.
  5. If a unicode string was given and this is gettext() or ngettext() we encode it using ‘utf-8’.
  6. If a unicode string was given and this is lgettext() or lngettext() we encode using the value of locale.getpreferredencoding()

For ugettext() and ungettext(), we go through the same set of steps with the following differences:

  • We transform byte str into unicode strings for these methods.
  • The encoding used to decode the byte str is taken from input_charset if it’s set, otherwise we decode using UTF-8.
input_charset

is an extension to the python standard library gettext that specifies what charset a message is encoded in when decoding a message to unicode. This is used for two purposes:

  1. If the message string is a byte str, this is used to decode the string to a unicode string before looking it up in the message catalog.
  2. In ugettext() and ungettext() methods, if a byte str is given as the message and is untranslated this is used as the encoding when decoding to unicode. This is different from _charset which may be set when a message catalog is loaded because input_charset is used to describe an encoding used in a python source file while _charset describes the encoding used in the message catalog file.

Any characters that aren’t able to be transformed from a byte str to unicode string or vice versa will be replaced with a replacement character (ie: u'�' in unicode based encodings, '?' in other ASCII compatible encodings).

See also

gettext.NullTranslations
For information about what methods are available and what they do.
class kitchen.i18n.NewGNUTranslations(fp=None)

Safer version of gettext.GNUTranslations

gettext.GNUTranslations suffers from two problems that this class fixes.

  1. gettext.GNUTranslations can throw a UnicodeError in gettext.GNUTranslations.ugettext() if the message being translated has non-ASCII characters and there is no translation for it.
  2. gettext.GNUTranslations can return byte str from gettext.GNUTranslations.ugettext() and unicode strings from the other gettext() methods if the message being translated is the wrong type

When byte str are returned, the strings will be encoded according to this algorithm:

  1. If a fallback has been added, the fallback will be called first. You’ll need to consult the fallback to see whether it performs any encoding changes.
  2. If a byte str was given, the same byte str will be returned.
  3. If a unicode string was given and set_output_charset() has been called then we encode the string using the output_charset
  4. If a unicode string was given and this is gettext() or ngettext() and a charset was detected when parsing the message catalog, output in that charset.
  5. If a unicode string was given and this is gettext() or ngettext() we encode it using UTF-8.
  6. If a unicode string was given and this is lgettext() or lngettext() we encode using the value of locale.getpreferredencoding()

For ugettext() and ungettext(), we go through the same set of steps with the following differences:

  • We transform byte str into unicode strings for these methods.
  • The encoding used to decode the byte str is taken from input_charset if it’s set, otherwise we decode using UTF-8
input_charset

an extension to the python standard library gettext that specifies what charset a message is encoded in when decoding a message to unicode. This is used for two purposes:

  1. If the message string is a byte str, this is used to decode the string to a unicode string before looking it up in the message catalog.
  2. In ugettext() and ungettext() methods, if a byte str is given as the message and is untranslated his is used as the encoding when decoding to unicode. This is different from the _charset parameter that may be set when a message catalog is loaded because input_charset is used to describe an encoding used in a python source file while _charset describes the encoding used in the message catalog file.

Any characters that aren’t able to be transformed from a byte str to unicode string or vice versa will be replaced with a replacement character (ie: u'�' in unicode based encodings, '?' in other ASCII compatible encodings).

See also

gettext.GNUTranslations.gettext
For information about what methods this class has and what they do

Table Of Contents

Previous topic

Kitchen API

Next topic

Kitchen.text: unicode and utf8 and xml oh my!

This Page