Plain text files - in most of the cases with a .txt
. extension - contain exclusively textual information. There is no clearly defined way to inform the computer which language they contain. In (very) simple terms, that means the computer will per default assume the text is written in the same language the computer itself uses.
If you are Russian, it is very likely that your computer works in Russian too: the menus are in Russian, the files you open will be in Russian etc. In most cases, the computer makes the right assumption regarding the Contents of files in general: they all contain Russian and nothing Russian characters could not display.
Now, if you are a Russian translator who translates from Japanese, the Japanese files you will get, if they are plain text files will most probably be considered by the computer to be files containing Russian. Because there is no information in the file itself that indicates to the computer in which language they are written. The Japanese file contents could be:
OmegaTとは、コンピュータを利用した翻訳ツールです。
Because it expects the contents to be Russian, your text editor could very well display it like this:
OmegaTВ∆ВЌБAГRГУГsГЕБ[Г^ВрЧШЧpµšЦ|ЦуГcБ[ГЛВ≈ВЈБB
However, it is far from Russian, it is Japanese characters wrongly displayed as Russian characters.
As any other application, OmegaT is subject to this problem too. It can only assume that per default plain text files can be displayed using the system defaults. That works well when the computer works in French for instance and the files are in English, or when the computer is German and you deal with get Italian files.
Why would that work with English and French but not with Russian and Japanese? Because English and French share a common character set. Namely Latin-1, or some variation of it. Until recently, Russian and Japanese have not shared any character sets. Most current Russian characters sets do not cover Japanese and vice versa. The result is what you have seen above.
The Japanese client works with a Japanese computer and creates text files that contain Japanese. The character set selected by the client computer will depend on the operating system and on other settings, but it is highly unlikely that the chosen (Japanese) character set will be correctly interpreted by the Russian computer.
How the textual information in the specified character set is physically transmitted (i.e. what are the numeric codes the computer uses to interpret and display text) depends on the encoding. When the computer reads the file, it "decodes" the information according to the encoding and displays it according to the character set. Roughly speaking, one encoding corresponds to one character set...
There are basically three ways to address this problem in OmegaT. They all involve the application of file filters in the Options menu.
.txt
extension - : in the Text files section of the file filters dialog, change the Source File Encoding from <auto> to the encoding that corresponds to your source .txt
file.
.txt
to .jp
for Japanese plain texts for instance): In the Text files section of the file filters dialog, add a new Source Filename Pattern (for example *.jp)
and select the appropriate parameters for the source and target encoding..txt
to .utf8
.OmegaT will automatically interpret the file as a UTF-8 file.OmegaT has by default the following short list available to make it easier for you to deal with some plain text files:
.txt
files are automatically (<auto>) interpreted by OmegaT as being encoded in the computer's default encoding..txt1
files are files in ISO-8859-1, covering most Western Europe languages..txt2
files are files in ISO-8859-2, that covers most Central and Eastern Europe languages.utf8
files are interpreted by OmegaT as being encoded in UTF-8 (an encoding that covers almost all languages in the world).You can check that yourself by selecting the item File Filters in the menu Options. For example, when you have a Czech text file (very probably written in the ISO-8859-2 code) you just need to change the extension .txt
to .txt2
and OmegaT will interpret its contents correctly. And of course, if you want to be on the safe side, consider converting this kind of files to Unicode, i.e. to the .utf8 file format.
Legal notices | Home | Index of contents |