Pepipopum – automatically translate PO files with Google Translate

Edit: Since I wrote this in 2009, Google have withdrawn free access to the translation API. I’ll leave this post up for anyone using the paid version though…

If you’ve ever worked on localizing an application or website, you may be familiar with the .po files used with GNU gettext and compatible tools.

I’ve written a script which can take a .po file and translate any untranslated strings with Google Translate. This may not be a ‘release quality’ translation, but does speed up the job of a real translator, who can simply proof read and correct the machine-translated entries.

See it in action here: http://pepipopum.dixo.net

I’ve released the source under the Affero GPL too, so you can tweak or host it yourself. The version hosted above does have a one second delay between translations, so if you want to go faster you’re encouraged to do exactly that!

Hope someone else finds it useful.

39 thoughts on “Pepipopum – automatically translate PO files with Google Translate

  1. lordelph Post author

    I’ve made a few fixes to correct some mangling of placeholders like %1 in translated strings, and also made the direct output appear as it is generated, rather than waiting until the entire translation is complete.

  2. lordelph Post author

    I’ve contacted JMbamba and have his po file, so will see what I can do to improve things.

    @Samir, do you mean as a comment, or in the string itself? Also, you can make it faster by installing it yourself – the version I’m hosting has an artificial delay.

  3. Samir Ribic

    Simply in output text for all translated lines (so not for the lines which originaly have some content) before the English line to put one
    #, fuzzy
    statement. You can turn on/off this option using one check box.

    I know that the delay is artificial.

  4. pite

    Hi!
    I was wondering if maybe your script is available from anywhere else since both links seem to be down.

    Thanks!

  5. lordelph Post author

    Sorry about the service being offline – I had it running on a spare server we’ve since taken offline. It will be back up shortly!

  6. Jeff

    Hi!

    This is a great programm, that you did! The only thing, that i am missing is a way to define the input language. My project is originally in german, so will have to modify the source…

  7. mike

    Is this supposed to be working? I submitted a file and it claimed it was translating it, but none of the lines were translated in the download or the online output? Very strange. The file i submitted had been created by django-admin.py makemessages

  8. lordelph Post author

    Oops, there was a problem with curl on that server. It works now.

    The hosted version is really just for demonstration, if you find it useful I encourage you to host it yourself.

  9. Paul

    THNX for this script so much! I’ve used it here on your site a bunch of times however when I tried to set it up on WAMP it will not run. It will think and think think and then show the Download file link but when clicking to download it says the file may have expired. I’ve enabled php_curl but still no dice. Any ideas?

  10. sdub

    It would be nice if will be possible translate already translated po-file to other language (similar) – need to add also the translation of destination.

  11. Rick Richardson

    #: gnome-manual-duplex.glade:381 gnome-manual-duplex.glade:446
    msgid “”
    “HP LJ 1005/1018/1020: reverse pages\n”
    “HP LJ P1005/P1006/P1505: reverse pages\n”
    “HP LJ Pro P1102, P1566: reverse pages\n”
    “HP CLJ 1600/2600/CP1215: reverse pages\n”
    “Minolta/QMS 2300 DL: reverse pages\n”
    “Others: depends\n”
    msgstr “HP LJ 1005/1018/1020: páginas de inverter HP LJ P1005/P1006/P1505: páginas de inverter HP LJ Pro P1102, P1566: páginas de inverter 1600/2600/CP1215 HP CLJ: páginas de inverter Minolta / QMS 2300 DL: páginas Outros inverso: depende”

    Should be (Portagese) 6 lines instead of 1 line:

    msgstr “”
    “HP LJ 1005/1018/1020: páginas de inverter\n”
    “HP LJ P1005/P1006/P1505: páginas de inverter\n”
    “HP LJ Pro P1102, P1566: páginas de inverter\n”
    “1600/2600/CP1215 HP CLJ: páginas de inverter\n”
    “Minolta / QMS 2300 DL: páginas de inverter\n”
    “Outros: depende”

  12. Göran Uddeborg

    I found this today, and do find it interesting. I’ve been translating for a while, and this seems like a tool that could help.

    I wish to encourage you to implement the “fuzzy” feature, as suggested above. Automatic translations are not good enough to be used without being proof-read and fixed first. They need to be marked as only approximate, and that is exactly what the fuzzy marker in po files are for.

    Otherwise, I will not know which ones to review. And if it is half of each, I’m not that much helped by the auto-translation any more.

  13. Dreas

    Hi!

    Cool service. However it’s not usable for us unless the Google translated strings are marked “fuzzy” as suggested by Samir Ribic already. The reason is that the strings need to be reviewed by a human to check if things are accurate. If you already have a file partially completed, there is no way to distinct between already verified lines and newly Google translated lines.

    Greets,

    Dreas

  14. Miklos

    I know, this is GREAT service – but currently the pepipopum.dixo.net site unavailable – please correct asap …

  15. Yisrael Dov

    Nice script

    2 requests.

    Ignore place holder strings like %s etc

    Mark as fuzy ( as mentioned above )

  16. lisa

    I downloaded the script to try it on my own machine, but cannot get it to work. Any suggestions?

  17. Ale

    Hi, great job! But is it still going to work after Google shuts down the Google Translate API on Dec 1st 2011?

  18. Robert

    Hi, it seems that Google does not provide the public API for translations anymore. Now there is a paid version 2.

  19. scott

    Hi Paul,

    Great Program, But i am worried, since google has made it V2 (Paid)

    Could you fix it for V2, So those having Key can still use it.

    That way we can save Great Piece of work.

  20. A1on

    Does anyone here found a solution how to translate the .po file automatically now when google wants to charge for using their service? I already setup an account with google api v2 and willing to pay for it..

    What Scott suggested sounds great

  21. scott

    Hello A1on,

    Now this solution is working.

    You can contact a person on xaurav at gmail com.

    He has fixed it and can provide this solution.

  22. Jsmes

    can u make a download link so can use this software?is not everyone who understands the programming languages

  23. Ian Dunn

    If this no longer works because Google shut down their free API, could you please make a note of that on the page? I just spent half an hour working on this only to realize it’s hopeless.

  24. Michael Hjulskov

    Hi
    Just want to share a new script that I made
    It uses Google Translate
    Here is what I have done
    Hope it helps somebody :o)


    <?php
    /**
    * Pepipopum - Automatic PO Translation via Google Translate
    * Copyright (C)2009 Paul Dixon (lordelph@gmail.com)
    * $Id: index.php 21080 2009-10-25 10:04:24Z paul $
    *
    * This program is free software: you can redistribute it and/or modify
    * it under the terms of the GNU Affero General Public License as
    * published by the Free Software Foundation, either version 3 of the
    * License, or (at your option) any later version.
    *
    * This program is distributed in the hope that it will be useful,
    * but WITHOUT ANY WARRANTY; without even the implied warranty of
    * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    * GNU Affero General Public License for more details.
    *
    * You should have received a copy of the GNU Affero General Public License
    * along with this program. If not, see .
    *
    *
    * REQUIREMENTS:
    *
    * Requires curl to perform the Google Translate API call but could
    * easily be adapted to use something else or make the HTTP call
    * natively.
    */
    /**
    * Define delay between Google API calls (can be fractional for sub-second delays)
    *
    * This reduces load on the server and plays nice with Google. If you want a faster
    * experience, simply host Pepipopum on your own server and lower this value.
    */
    define('PEPIPOPUM_DELAY', 1);

    // dont forget to register google API -> https://code.google.com/apis/console/
    // and visit "BILLING" and pay 10$ or more
    // and visit "API Access" and find Your google translate API key (Key for browser apps (with referers))
    // and edit/add a referer to "*.currentdomain.dk/*"
    // and put your key here:
    $mykey="PUT_KEY_HERE";

    // the name of this file (put it in the root)
    $this_filename = "translateNow.php";

    /**
    * POProcessor provides a simple PO file parser
    *
    * Can parse a PO file and calls processEntry for each entry in it
    * Can derive from this class to perform any transformation you
    * like
    */
    class POProcessor
    {
    public $max_entries=0; //for testing you can limit the number of entries processed
    private $start=0; //timestamp when we started
    public function __construct()
    {
    }
    /**
    * Set callback function which is passed the completion
    * percentage and remaining time of the parsing operation. This callback
    * will be called up to 100 times, depending on the
    * size of the file.
    *
    * Callback is a function name, or an array of ($object,$methodname)
    * as is common for PHP style callbacks
    */
    public function setProgressCallback($callback)
    {
    $this->progressCallback=$callback;
    }
    /**
    * Parses input file and calls processEntry for each recgonized entry
    * and output for all other lines
    *
    * To track progress, see setProgressCallback
    */
    public function process($inFile)
    {
    set_time_limit(86400);
    $this->start=time();
    $msgid=array();
    $msgstr=array();
    $count=0;
    $size=filesize($inFile);
    $percent=-1;
    $state=0;
    $in=fopen($inFile, 'r');
    while (!feof($in))
    {
    $line=trim(fgets($in));
    $pos=ftell($in);
    $percent_now=round(($pos*100)/$size);
    if ($percent_now!=$percent)
    {
    $percent=$percent_now;
    $remain='';
    $elapsed=time()-$this->start;
    if ($elapsed>=5)
    {
    $total = $elapsed/($percent/100);
    $remain=$total-$elapsed;
    }
    $this->showProgress($percent,$remain);
    }
    $match=array();
    switch ($state)
    {
    case 0://waiting for msgid
    if (preg_match('/^msgid "(.*)"$/', $line,$match))
    {
    $clean=stripcslashes($match[1]);
    $msgid=array($clean);
    $state=1;
    }
    break;
    case 1: //reading msgid, waiting for msgstr
    if (preg_match('/^msgstr "(.*)"$/', $line,$match))
    {
    $clean=stripcslashes($match[1]);
    $msgstr=array($clean);
    $state=2;
    }
    elseif (preg_match('/^"(.*)"$/', $line,$match))
    {
    $msgid[]=stripcslashes($match[1]);
    }
    break;
    case 2: //reading msgstr, waiting for blank
    if (preg_match('/^"(.*)"$/', $line,$match))
    {
    $msgstr[]=stripcslashes($match[1]);
    }
    elseif (empty($line))
    {
    //we have a complete entry
    $this->processEntry($msgid, $msgstr);
    $count++;
    if ($this->max_entries && ($count>$this->max_entries))
    {
    break 2;
    }
    $state=0;
    }
    break;
    }
    //comment or blank line?
    if (empty($line) || preg_match('/^#/',$line))
    {
    $this->output($line."\n");
    }
    }
    fclose($in);
    }
    /**
    * Called whenever the parser recognizes a msgid/msgstr pair in the
    * po file. It is passed an array of strings for the msgid and msgstr
    * which correspond to multiple lines in the input file, allowing you
    * to preserve this if desired.
    *
    * Default implementation simply outputs the msgid and msgstr without
    * any further processing
    */
    protected function processEntry($msgid, $msgstr)
    {
    $this->output("msgid ");
    foreach($msgid as $part)
    {
    $part=addcslashes($part,"\r\n\"");
    $this->output("\"{$part}\"\n");
    }
    $this->output("msgstr ");
    foreach($msgstr as $part)
    {
    $part=addcslashes($part,"\r\n\"");
    $this->output("\"{$part}\"\n");
    }
    }
    /**
    * Internal method to call the progress callback if set
    */
    protected function showProgress($percentComplete, $remainingTime)
    {
    if (is_array($this->progressCallback))
    {
    $obj=$this->progressCallback[0];
    $method=$this->progressCallback[1];
    $obj->$method($percentComplete,$remainingTime);
    }
    elseif (is_string($this->progressCallback))
    {
    $func=$this->progressCallback;
    $func($percentComplete,$remainingTime);
    }
    }
    /**
    * Called to emit parsed lines of the file - override this
    * to provide customised output
    */
    protected function output($str)
    {
    global $output;
    $output.=$str;
    }
    }
    /**
    * Derivation of POProcessor which passes untranslated entries through the Googl
    e Translate
    * API and writes the transformed PO to another file
    *
    */
    class POTranslator extends POProcessor
    {
    /**
    * Google API requires a referrer - constructor will build a suitable defaul
    t
    */
    public $referrer;
    /**
    * How many seconds should we wait between Google API calls to be nice
    * to google and the server running Pepipopum? Can use a floating point
    * value for sub-second delays
    */
    public $delay=PEPIPOPUM_DELAY;
    public function __construct()
    {
    parent::__construct();
    //Google API needs to be passed a referrer
    $this->referrer="http://{$_SERVER['HTTP_HOST']}{$_SERVER['REQUEST_URI']}";
    }
    /**
    * Translates a PO file storing output in desired location
    */
    public function translate($inFile, $outFile, $srcLanguage, $targetLanguage)
    {
    $ok=true;
    $this->srcLanguage=$srcLanguage;
    $this->targetLanguage=$targetLanguage;
    $this->fOut=fopen($outFile, 'w');
    if ($this->fOut)
    {
    $this->process($inFile);
    fclose($this->fOut);
    }
    else
    {
    trigger_error("POProcessor::translate unable to open $outfile for writing", E_USER_ERROR);
    $ok=false;
    }
    return $ok;
    }
    /**
    * Overriden output method writes to output file
    */
    protected function output($str)
    {
    if ($this->fOut)
    {
    fwrite($this->fOut, $str);
    flush();
    }
    }
    /**
    * Overriden processEntry method performs the Google Translate API call
    */
    protected function processEntry($msgid, $msgstr)
    {
    $input=implode('', $msgid);
    $output=implode('', $msgstr);
    if (!empty($input) && empty($output))
    {
    $q=urlencode($input);
    $langpair=urlencode("{$this->srcLanguage}|{$this->targetLanguage}");
    $url="https://www.googleapis.com/language/translate/v2?key=".$mykey."&q=".$q."&source=en&target=da";

    $cmd="curl -e ".escapeshellarg($this->referrer).' '.escapeshellarg($url);
    $result=`$cmd`;

    $data=json_decode($result);
    //echo $data->data->translations[0]->translatedText;
    if (is_object($data) && is_object($data->data->translations[0]) && isset($data->data->translations[0]->translatedText))
    {
    $output=$data->data->translations[0]->translatedText;
    //Google translate mangles placeholders, lets restore them
    $output=preg_replace('/%\ss/', '%s', $output);
    $output=preg_replace('/% (\d+) \$ s/', ' %$1\$s', $output);
    $output=preg_replace('/^ %/', '%', $output);
    //have seen %1 get flipped to 1%
    if (preg_match('/%\d/', $input) && preg_match('/\d%/', $output))
    {
    $output=preg_replace('/(\d)%/', '%$1', $output);
    }
    //we also get entities for some chars
    $output=html_entity_decode($output);
    $msgstr=array($output);
    }
    //play nice with google
    //usleep($this->delay * 1000000);
    }
    //output entry
    parent::processEntry($msgid, $msgstr);
    }
    }
    //simple progress callback which emits some JS to update the
    //page with a progress count
    function showProgress($percent,$remainingTime)
    {
    $time='';
    if (!empty($remainingTime))
    {
    if ($remainingTime<120)
    {
    $time=sprintf("(%d seconds remaining)",$remainingTime);
    }
    elseif ($remainingTime<60*120)
    {
    $time=sprintf("(%d minutes remaining)",round($remainingTime/60));
    }
    else
    {
    $time=sprintf("(%d hours remaining)",round($remainingTime/3600));
    }
    }
    echo '';
    echo "document.getElementById('info').innerHTML='$percent% complete $time';";
    echo "\n";
    flush();
    }
    function processForm()
    {
    set_time_limit(86400);
    $translator=new POTranslator();
    if ($_POST['output']=='html')
    {
    //we output to a temporary file to allow later download
    echo 'Processing PO file...';
    echo '';
    $translator->setProgressCallback('showProgress');
    $outfile = tempnam(sys_get_temp_dir(), 'pepipopum');
    }
    else
    {
    //output directly
    header("Content-Type:text/plain");
    $outfile="php://output";
    }
    $translator->translate($_FILES['pofile']['tmp_name'], $outfile, 'en', $_POST['language']);
    if ($_POST['output']=='html')
    {
    //show download link
    $leaf=basename($outfile);
    $name=$_FILES['pofile']['name'];
    echo "Completed - download your updated po file";
    }
    else
    {
    //we're done
    exit;
    }
    }
    if (isset($_GET['viewsource']))
    {
    highlight_file($_SERVER['SCRIPT_FILENAME']);
    exit;
    }
    if (isset($_GET['download']) && isset($_GET['name']))
    {
    //check download file is valid
    $file=sys_get_temp_dir().DIRECTORY_SEPARATOR.$_GET['download'];
    $ok=preg_match('/^pepipopum[A-Za-z0-9]+$/', $_GET['download']);
    $ok=$ok && file_exists($file);
    //sanitize name
    $name=preg_replace('/[^a-z0-9\._]/i', '', $_GET['name']);
    if ($ok)
    {
    header("Content-Type:text/plain");
    header("Content-Length:".filesize($file));
    header("Content-Disposition: attachment; filename=\"{$name}\"");
    readfile($file);
    }
    else
    {
    //fail
    header("HTTP/1.0 404 Not Found");
    echo "The requested pepipopum output file is not available - it may have expired. Click here to generate a new one.";
    }
    exit;
    }
    if (isset($_POST['output']) && ($_POST['output']=='pofile'))
    {
    processForm();
    }
    ?>

    Pepipopum - Translate PO file with Google Translate

    body
    {
    background:#eeeeee;
    margin: 0;
    padding: 0;
    text-align: center;
    font-family:Verdana,Arial,Helvetica
    }
    #main
    {
    padding: 3em;
    margin: 1em auto 1em auto;
    width: 50em;
    border:1px solid #dddddd;
    text-align: left;
    background:white;
    }
    #footer
    {
    text-align:right;
    font-size:8pt;
    color:#888888;
    border-top:1px solid #888888;
    }
    h1
    {
    margin-top:0;
    }
    form
    {
    background:#dddddd;
    padding:2em;
    margin:0 2em 0 2em;
    -moz-border-radius: 1em;
    -webkit-border-radius: 1em;
    border-radius: 1em;
    font-size:0.8em;
    }
    fieldset
    {
    background:#cccccc;
    border:1px solid #aaaaaa;
    margin-bottom:1em;
    padding:1em;
    position:relative;
    -moz-border-radius: 0.5em;
    -webkit-border-radius: 0.5em;
    border-radius: 0.5em;
    }
    legend
    {
    background:#aaaaaa;
    border:0;
    padding:0 1em 0 1em;
    margin-left:1em;
    color:#ffffff;
    position: absolute;
    top: -.5em;
    left: .2em;
    -moz-border-radius: 0.5em;
    -webkit-border-radius: 0.5em;
    border-radius: 0.5em;
    }

    Pepipopum - Translate PO files with Google Translate
    PO files originate from the GNU gettext
    tools and can be generated by a wide variety of other localization tools.
    Pepipopum allows you to upload a PO file containing English language strings
    in the msgid,
    and it uses the Google Translate API
    to construct a PO file containing translated equivalents in each corresponding msgstr
    If the PO file already contains a translation for a given msgid, it will not
    be translated. This
    allows you to upload a proof-read PO and just get translations for any new elements.
    <form enctype="multipart/form-data" action="" method="post">

    Input

    PO File

    Output options

    Target Language

    Afrikaans
    Albanian
    Arabic
    Belarusian
    Bulgarian
    Catalan
    Chinese (Simplified)
    Chinese (Traditional)
    Croatian
    Czech
    Danish
    Dutch
    English
    Estonian
    Filipino
    Finnish
    French
    Galician
    German
    Greek
    Hebrew
    Hindi
    Hungarian
    Icelandic
    Indonesian
    Irish
    Italian
    Japanese
    Korean
    Latvian
    Lithuanian
    Macedonian
    Malay
    Maltese
    Norwegian
    Persian
    Polish
    Portuguese
    Romanian
    Russian
    Serbian
    Slovak
    Slovenian
    Spanish
    Swahili
    Swedish
    Thai
    Turkish
    Ukrainian
    Vietnamese
    Welsh
    Yiddish

    Output PO File

    Output progress meter and then provide a download link

    You can automate translation by using a tool like cURL to post a PO file and obtain
    a translated result. For example:

    curl -F pofile=@input-po-filename \
    -F language=target-language-code \
    -F output=pofile
    http://pepipopum.dixo.net \
    --output output-po-filename

    The PHP5 source code to this software is available under an
    Affero GPL licence. Please
    note that this installation of Pepipopum introduces a second
    delay between each Google API call to reduce load on this server and to play nice with
    Google. If you want to go faster, you're encouraged to host your own installation.
    Why is called "Pepipopum"? I just invented a word which had
    'po' in it and was relatively rare on Google! Pronounce it pee-pie-poe-pum.
    Comments and suggestions are welcome.

    (c)2009 Paul Dixon

Comments are closed.