Plain text files - in most of the cases with a .txt
. extension - contain exclusively textual information. There is no clearly defined way to inform the computer which language they contain. In (very) simple terms, that means the computer will per default assume the text is written in the same language the computer itself uses.
If you are Russian, it is very likely that your computer works in Russian too: the menus are in Russian, the files you open will be in Russian etc. In most cases, the computer makes the right assumption regarding the Contents of files in general: they all contain Russian and nothing Russian characters could not display.
Now, if you are a Russian translator who translates from Japanese, the Japanese files you will get, if they are plain text files will most probably be considered by the computer to be files containing Russian. Because there is no information in the file itself that indicates to the computer in which language they are written.
The Japanese file Contents could be:
OmegaTとは、コンピュータを利用した翻訳ツールです。
But your text editor could very well display it like this:
OmegaTВ∆ВЌБAГRГУГsГЕБ[Г^ВрЧШЧpµšЦ|ЦуГcБ[ГЛВ≈ВЈБB
Because it expects the Contents to be Russian... But this is not Russian. This is Japanese characters wrongly displayed as Russian characters.
OmegaT is no different. OmegaT considers that plain text files contain text that can automatically be displayed by using the computer's defaults. That works well when the computer works in French and if you get English files, or when the computer is German and if you get Italian files.
Why would that work with English and French but not with Russian and Japanese? Because English and French share a common character set. Namely Latin-1, or a variation. Until recently, Russian and Japanese have not shared any character sets. Most current Russian characters sets do not cover Japanese and reciprocally. The result is as shown above.
The Japanese client works with a Japanese computer and creates text files that contain Japanese. The character set selected by the client computer will depend on the operating system and of other settings, but it is very unlikely that the chosen (Japanese) character set will be correctly interpreted by the Russian computer.
Now, how the textual information in the specified character set is physically transmitted (i.e. how is it written in the file for the computer to interpret and display) depends on an encoding. When the computer reads the file, it "decodes" the information according to the encoding and displays it according to the character set. Roughly speaking, one encoding corresponds to one character set...
There are basically 3 ways to fix this in OmegaT. The 3 ways all involve using the file filters in the Options menu.
.txt
extension..txt
file.
.txt
to .jp
for Japanese plain texts for instance.*.jp
Source Filename Pattern and select the appropriate parameters for the source and target encoding..txt
to .utf8
.Currently, OmegaT is set to understand plain text files as follows
.txt
files are automatically (<auto>) interpreted by OmegaT as being encoded in the computer's default encoding..txt1
files are files in ISO-8859-1, covering most Western Europe languages..txt2
files are files in ISO-8859-2, that covers most Central and Eastern Europe languages).utf8
files are interpreted by OmegaT as being encoded in UTF-8 (an encoding that covers almost all languages in the world).You can check that yourself by selecting the item File Filters in the menu Options.
OmegaT just keeps this short list ready to make it easier for you to deal with some plain text files.
For example, when you have a Czech text file (very probably written in the ISO-8859-2 code) you just need to change the extension .txt
to .txt2
and OmegaT will interpret its contents correctly.