2011年11月28日 星期一

python gettext 模組:設定python module 的internationalization 和 localization

http://www.pixelbeat.org/programming/i18n.html


First of all i18n is shorthand for internationalisation,
The same reasoning is behind l10n.

The standard translation support on linux is "gettext".
It consists of a translations database stored in
the filesystem, utilities to manage the database
and an API (which comes with glibc) to access it.

Database:

    The translations database is stored in seperate files like:

        $dirname/$locale/$category/$domain.mo

    an example of the variables being:

        dirname=/usr/share/locale    #This is the usual location
        locale=en_IE                 #language_COUNTRY
        category=LC_MESSAGES         #strings in your app
        domain=fslint                #your app

API (to set variables above in your program):

    C:

        #include <locale.h>
        bindtextdomain("fslint","/usr/share/locale");
        setlocale(LC_ALL,""); /* set all locale categories to value in LC_ALL or LANG environment variables */
        /* note gettext uses LC_MESSAGES category */
        textdomain("fslint");

    Python:

        import gettext, locale
        gettext.bindtextdomain("fslint", "/usr/share/locale") #sys default used if localedir=None
        locale.setlocale(locale.LC_ALL,'')
        gettext.textdomain("fslint")

        #Note if you initially do the following, it is much
        #faster as lookup for mo file not done for each translation
        #(the C version automatically caches the translations so it's not needed there).
        gettext.install("fslint",localedir=None,unicode=1) #None is sys default locale

        #Note also before python 2.3 you need the following if
        #you need translations from non python code (glibc,libglade etc.)
        gtk.glade.bindtextdomain("fslint",textdomain) #there are other access points to this function
        #Since python 2.3 one still needs to call the following
        #as the gettext equivalent doesn't do it in case the message
        #catalogs are in different formats for libc and the python app
        locale.bindtextdomain("fslint",textdomain)

        #Note python parses the translations itself, instead of letting
        #glibc do it. This is for platform independence I suppose, but
        #it does allow you to use python to display existing message catalogs:
        $ LANG=es python
        >>> import gettext
        >>> gettext.install("libc")
        >>> for item in gettext._translations['/usr/share/locale/es/LC_MESSAGES/libc.mo']._catalog.keys():
        >>>     print item, ":",  gettext._translations['/usr/share/locale/es/LC_MESSAGES/libc.mo']._catalog[item]

    To actually call the gettext translation functions
    just replace your strings "string" with gettext("string")
    The following shortcuts are usually used:

    Python:
        _ = gettext.gettext #Don't do if used gettext.install above (more inefficient)
        print _("translated string")

    C:
        #define _(x) gettext(x)
        printf(_("translated string"));

Utilities:

    The next thing to do is extract the marked strings from your
    source files for translation and insertion into the database. Python used to
    have its own utility (pygettext.py) to do this, but the best way
    now is to use the standard xgettext utility which now supports python.
    The output from this stage is a pot file.

    The last thing left to is actually do the translations.
    Translators create a "po" file from the pot file above,
    by just entering the text for the source strings in the pot file.
    Then the developer compiles these to binary mo files for
    use by the application. msgfmt and msgmerge are the main
    utilities for manipulating po, pot and mo files.

    The quickest way to learn about the external utilities
    (xgettext, msgmerge, msgfmt) is to look at existing examples,
    which are usually in po/Makefile in various projects, including: FSlint

Charsets:

   Translators can represent your strings in various ways.
   For e.g. the Euro symbol (€) can be encoded like:

         A4 in iso-8859-15
       20AC in unicode
     E282AC in utf-8

   All in, utf-8 is the best one to use if you can,
   as it involves the least conversion and is very
   efficient for primarily ascii text.

   Note gtk2 only takes utf8. Note also pygtk will
   auto convert from unicode to utf-8. Python will
   convert translations to unicode if you specify
   unicode=1 to gettext.install(). So for e.g.
   if you got translations in each of the 3 encodings
   above the charset translation process for pygtk
   would be:

   iso-8859-15 \
   unicode      - unicode - utf-8
   utf-8       /

Misc

   It's not just strings that need to be translated
   in an application. For e.g. there are differing
   number and date representations. To handle these
   you need to use variants of the standard functions
   for representing numbers to users:

   C:
       #include <locale.h>
       setlocale(LC_ALL, "");
       printf("%'d", 1234); /* notice the ' */

   Python:
       import locale
       locale.setlocale(locale.LC_ALL, "")
       locale.format("%d", 1234, 1) #this is a little limited as of 2.2.3

More info

   info gettext

沒有留言:

張貼留言