The State of Open Translation Tools
The field of translation is in a state of transition, and software tools to support language translation are evolving with corresponding rapidity. Increasingly available online resources are quickly expanding the possible and the practical when it comes to translating content, and processes and business models which have remained relatively staid for decades are being rethought.
Even in the so-called “broadband” era where substantial parts of the globe enjoy ubiquitous high-speed access and where translation is thus more important than ever, most translators and translation firms have employed rather rudimentary technology processes in their translation workflow. Translators generally copy and paste text between word processor documents and transmit translated documents as email attachments that lack all but the most basic version control or metadata. However, new online tools and innovative new workflow models are turning the translation field on its head.
Open Translation Tools Today
The state of Open Translation tool offerings reflects the same flux. Real-time access to a global network of translation services and talent is a resource that the translation industry is only now starting to leverage and upstart multilingual projects on the internet are pushing the state of the art by treating translation as an exercise in distributed problem solving.
In addition, most Open Translation tools have recently begun to incorporate workflow, user role tracking, permissions and detailed state information for each translation project. From the RSS-enabled platforms like Worldwide Lexicon, which automate translation requests and submissions, to crowd-sourced tools like dotSUB (which although not open source) employ an open approach to data and translation for subtitling digital videos, Open Translation tools are demonstrating their ability to not only track but also outpace closed and proprietary offerings.
Open Translation tools, then, fall into a range of categories:
- PO and XLIFF localization editors: This encompasses offline, online and distributed localization tools that read and write data in PO, XLIFF and related formats. These serve as the essential tools for many translators and localizers. Examples of these tools include Pootle, Poedit, gtranslator, Transolution, and Lokalize.
- Translation workflow: These tools manage roles, tasks and other project information, and often interoperate with other translation tools and version control systems. Workflow is a critical area for the growth of Open Translation, and there exists a range of un-met needs in terms of workflow support. Examples of these tools include Transifex, Translate Toolkit, Pootle, Launchpad Translations and Worldwide Lexicon.
- Subtitling: As video becomes a more pervasive web offering, tools for adding translated subtitles to videos are becoming more in demand. Examples of such tools include GNOME Subtitles and DotSub.
- Machine translation: These tools, which at present are primarily hosted as web sites like translate.google.com and BabelFish, perform algorithmic translation of text from one language to another. Examples of these tools include Apertium and Moses.
- Translation Memory: These Computer Aided Translation (CAT) tools store small discrete language fragments, passages, and terms in order to assist human translators as they perform their work. Examples of these tools include QT Linguist and OmegaT.
- Dictionary and Glossary: As their names imply, these CAT tools store definitions for terms in a given language, and support translators as they map from one language to another. Examples of these tools include CollaboDict and Transolution.
- Wiki translation: These modules and extensions enhance and augment existing wiki platforms with tools for performing and managing translation of wiki content. Examples of these tools include Cross-Lingual Wiki Engine and translationwik.net.
As with almost any collection of software tools, these categories blur and overlap on a tool-by-tool basis; the categories are somewhat arbitrary and many tools fall into more than one.
A detailed listing of Open Translation tools is available at http://socialsourcecommons.org/toolbox/show/110. We encourage readers to add missing tools to that list.
Related tools and resources
There are number of related tool categories and resources which are worth mentioning in the context of Open Translation:
- Code libraries and packages: While the focus of this book is on tools for end users in various translation workflows, code libraries are an essential and core element of the Open Translation ecology. Most ubiquitous among the libraries is gettext, the API used by a wide range of localization and translation tools to read and write PO files and other translation-related data.
- Content Management Systems (CMS): FLOSS CMS platforms offer a range of multilingual capabilities. While no current CMS readily supports a true multilingual web site (that is, either a single site available in multiple languages, or alternately a site on which separate pages can contain text in multiple scripts), many CMS platforms offer good support for translating site content. These include Drupal, Plone, Joomla!, Twiki, and FLOSS Manuals.
- Operating systems: End-user support for multi-lingual operating systems is very much the exception; users of Windows, Macintosh, and most Linux distributions install for a given locale, and must often reboot to properly run in a different locale. A noteworthy variant in this regard is Linguas OS, a distribution of GNU/Linux operating system adapted for professional translators and those working in software localization.
- Guides and online resources: While too numerous to enumerate here, a number of guides and online resources are available to those working in Open Translation. Several of the most noteworthy include the UNDP Localization Primer, LISA publications which provide best practices and primers from Localization Industry Standards Organization, and the wiki at translate.sourceforge.net. A resource specific to GNOME is Damned Lies, which is a hub for translation workflow for the GNOME project.
Open Translation Feature Gaps
Open Translation is an emergent field and a primary point of discussion is about the areas in which Open Translation tools are lacking. While a range of gaps exist, there were two primary functionality holes that arguably overshadow the rest:
- Workflow support: Though a number of Open Translation tools provide limited support for translation workflow processes, there is currently no tool or platform with rich and general support for managing and tracking a broad range of translation tasks and workflows. The internet has made possible a plethora of different collaborative models to support translation processes. But open source tools to manage those processes, tracking assets and state, role and assignments, progress and issues, are few. While tools like Transifex provide support for specific workflows in specific communities, generalized translation workflow tools are still few in number. An ideal Open Translation tool would understand the range of roles played in translation projects, and provide appropriate features and views for users in each role. As of this writing, most Open Translation tools at best provide workflow support for the single type of user which that tool targets.
- Distributed translation with memory aggregation: As translation and localization evolve to more online-centric models, there is still a dearth of tools which leverage the distributed nature of the internet and offer remote translators the ability to contribute translations to sites of their choosing which request the same. As of this writing, Worldwide Lexicon is the most advanced platform in this regard, providing the ability for blogs and other open content sites to integrate distributed translation features into their interfaces. In addition, there needs to be a richer and more pervasive capture model for content translated through such distributed models, in order to aggregate comprehensive translation memories in a range of language pairs.
Other Open Translation technology gaps include:
Lack of integration and interoperability between tools means both frustration for users and feature duplication by developers. Different communities have their own toolsets, but it is difficult for a translation project to make coherent use of a complete tool set. Among the interoperability issues which require further attention in the Open Translation tools ecology:
- Common programming interfaces for tools to connect, share data and requests, and collect translation memories and other valuable data.
- Plugins for content management systems to export content into PO-files, so that content can be translated by the wealth of tools that offer PO support.
- Better integration between different projects, including shared glossaries, common user interfaces and subsystems, and rich file import/export.
- Generic code libraries for common feature requirements. "gettext" stands out as one of the most ubiquitous programming interfaces in the Open Translation arena, but many more interfaces and services could be defined and adopted to maximize interoperability of both code and data.
Tools for content review are lacking; features for quality review should be focused on distributed process and community-based translation. As such reviews can be a delicate matter, the ideal communication model when there are quality problems is to contact the translator, but timing can be an issue. In systems with live posts and rapid translation turnaround, quick review is important and it may not be possible to reconnect with the content translator in a timely fashion.
A Future Vision of Open Translation Tools
One of the goals of this book is to drive discussion and creation of better Open Translation tools. This section describes an idealized feature set for the Open Translation tool space, specifying functionality for a tool which does not yet exist, but which would meet the broadest range of text translation needs in terms of features, supported workflows, and business models.
It is important to note that is a purely theoretical exercise; it is generally agreed that large monolithic tools are not the right course for the future, and that a small, distributed set of tools that work well together is the recommended path for better supporting Open Translation efforts.
That said, the described feature set is both expansive and impressive in its ambition to meet a wealth of Open Translation needs. The following sections describe those desired features, grouped into three sets: core features, workflow support, and additional features.
While most of these capabilities are available in various proprietary and open source tools, there is not currently a FLOSS tool or tool set that comes close to offering the features enumerated below.
The following should be considered requisite for any idealized functionality. These are primarily features associated with the translation of a single text source; higher-level features are described in subsequent sections.
The following should all be available in the user interface for the tool:
- Original text display would show the source text, using color and iconography to denote progress, commentary and other relevant metadata.
- Output/preview display would render the translated text, maintaining layout from the original and supporting detailed linkage between the source and translated versions of the text.
- A commenting/annotation feature would allow users to select and annotate text in both the source and translated text in order to add comments and other useful annotations to the core data.
- Machine translation support would enable users to generate a machine-translated version for all or selected parts of the source text, in order to obtain a first-pass rendering of the target translation.
- Terminology/glossary translation would provide support for translating specialized terms from translation memory.
- Dictionary widget would provide definitions for terms in both the source and target languages.
Other desirable core features included:
- Pervasive Unicode support for all input and output text, with rich conversion support in both directions. Unicode is a “superset” character encoding, with the ability to store any language or character set. Many existing tools are not Unicode-aware, creating limitations and interoperability problems.
- Ability to view alternate source text, in situations where the source has already been translated to another target language. In these situations, the tool would enable translators to view and utilize prior translations as secondary “source” for clarifying meaning and keeping translations consistent.
The following features would address support for the actual processes, or workflow, of text translation.
- Progress and state management: The core worklflow features would enable definition of milestones, assignment of tasks, and entry of time estimates for pending work. For both individual documents and collections of documents, the tool would provide the ability to track translation, editing, and proofreading status. The tool would also support progress estimation in both objective terms (“document translation is 80% complete”) and subjective ones (“this is high quality translation”).
- Role-based user features: The ideal tool would expose different feature sets for different types of users in the translation process:
- Project managers would have a dashboard of all translation activity and status, with the ability to "drill down" for additional detail.
- Translators would view their pending translation documents and tasks, in concert with tools to progress on those tasks.
- Editors would view the queue of documents and document segments awaiting review, as well as the status of documents in editorial process.
- Proofreaders and reviewers would view the queue of documents and document segments awaiting proofreading, as well as the status of documents in proofreading process.
- Original authors would be able to track the translation status of documents they had created and made available for translation.
- End users would be able to track the availability of translations they had requested.
- Status change notification: The platform would enable all stakeholders to be notified of changes in status to any document in the system, as well as the arrival of new documents into the system. Notification could be done via email or RSS (Rich Site Syndication).
- Accounting: The tool would be able to track hours and completed tasks for each project member, allowing managers to both assess productivity and track compensation.
- Collaborative document mark-up: Users could make annotations – e.g., "I had a problem with this phrase"– at any level of detail or scope, and invite others to give feedback. Such markup could also be tied to shared online discussions such as chat rooms or instant messaging.
- Review process: As each translation was ready for review, the tool would support assignment of review tasks, and track both editorial and proofreading reviews. An additional component would provide support for peer review, where fellow translators could assess the work and comment on semantics, nuance, and other subtleties.
- Reputation management: Hand-in-hand with a review process would be reputation tracking for each user of the system, especially translators. Such a subsystem would track the quality of each user's work, in both objective terms (100% of assigned tasks completed) and subjective terms (editors, proofreaders and peers could evaluate translators on various criteria). Such a system would ideally enable translation managers to select the most suitable translators and other personnel for specific translation tasks.
- Import and export of source documents: The tool would be able to handle the broadest range of document formats and encodings, allowing easy import of source texts from Open Office and other suites, HTML, PDF, raw text and other editing tools. Translated texts could be exported in all of the same formats.
- Segmentation of larger texts: Large documents often need to be broken down into smaller units in order to be delegated to different translators or parceled out in manageable units. Segmentation support would allow breaking large documents into such units, provide tracking of each segment's status and task ownership, and enable eventual re-assembly of the translated segments into a final unified document. Additional functionality would allow prioritizing the segments, so that important sections were done first, and less important sections could be deferred and potentially delegated to less experienced translators.
- Version tracking: Translated documents go through a number of versions, both in translation as well as during subsequent editing and proofreading. The tool would archive all versions of each document using a subsystem such as Subversion, and then provide the ability to compare any two versions to see differences and changes.
- Cross-lingual change tracking: While version tracking would maintain history for individual documents, cross-lingual change tracking would enable project managers and translators to be notified when a source document was changed, in order that other dependent language versions of the document could be flagged for pending updates. Such a feature would enable multi-language sets for a particular document to remain synchronized.
- License tracking: An ideal tool would be able to track licensing for imported documents, and ensure that appropriate licensing was assigned to any translated works in a system that supported human overrides to reflect the broad range of intellectual property agreements under which translations can happen.
- Offline use: While internet-based features would be critical to the realization of any “dream tool”, just as essential would be the ability to enjoy rich offline functionality. The tool would need to launch and operate when no connection was available, supporting translation and editorial tasks, and storing edits and progress updates for synchronization the next time the user connected.
- Unified translation memory: This feature would provide local translation memory combined with access to external translation memories. There are a range of memories available, but it would be useful to have centralized repository capabilities. Similar functionality could be provided for glossaries.
- Multi-lingual comparison: For documents translated into multiple target languages, this would allow translators to review how translation was done for related languages. For example, when translating to Serbo-Croatian, a translator could be aware of other Baltic language translations, and could see the work other translators had done in those similar languages.
- Pledge bank: Funding the translation of open content is often problematic, because it is not usually institutionally driven. Pledge bank functionality would allow translators to post estimated costs for translating particular documents, and allow parties interested in seeing the document translated to pledge monies they would contribute if the document was actually translated. The document would only be translated and pledges collected once the pledge total reached the projected translation cost.
- Translation of SVG graphics: Scalable Vector Graphics (SVG) are images where the data stored includes any text contained in the graphic. A dream translation tool would support translation of the text within SVG files, in order to offer a more complete translation solution.