|
Survey of Measures:
Here we present the methods themselves, which comprise the HTI
We only present methods that we found and deduced from actual field examples.
It is not a comprehensive list yet, and may never be.
1. |
Color concealment (background and text in the same color):
When the foe views the web page (after having submitted it to a
translation engine) both the text and its background are the same color,
thus making text invisible. One way that the designer can effect this is
to install special instructions (probably in the style sheet) to foul the
translation presentation, so as to change the color of parts of the text
(e.g., white text on white background).
Immediately below is an actual example taken from the internet
(at the time of this research was at:
http://glovia.fujitsu.com/jp/event/kansai/02sf1113.html).
The "Japanese View" is easily visible text in the middle of the
frame. The translated "English View" has that text translated,
but it is presented in the identical color (white) as the background,
hence is invisible. If you highlight the "white text" area
(using Control A in Internet Explorer), you would be able to see the text
we have presented in the "Exposed View".
|
|
|
In this case, the perpetrator testified that no promotion of a
subject had occurred, yet, on the internet, in Japanese, they had delivered
essential information which the translation engine translated as: "The
July this year of severe heat, at Tokyo international forum it received
favorable comment ..." in direct contradiction to their testimony.
However, English readers could not see this translated statement
since the text's background was the same color as the text.
There are a three courses to thwart this ploy: a) Read the
page in Japanese, b) Have the translation engine "anonymously"
fetch the page, and c) search for
"font ... color=same"
where "same" is a color the same as the background, in this
case white.
|
2. |
Using image files to hide text:
This is a way to put textual content in view without search engines
being able to see the content. Therefore a foe cannot find the
file, and a translation engine cannot translate the file. (The
search engine can translate encoded text, but not images).
If you examine the figure below, you can make out five
occurrences of the word "EXCEL". By putting these occurrences
into a graphics figure the perpetrator prevents their detection
by all search engines. In fact, five detected occurrences would,
generally, cause the search engine to score the page much higher
thusly increasing the likelihood of the page being accessed.
Therefore, this is a simple and effective method for operating
in a low profile.
|
|
|
Using this device in a friendly domain, you could be
given information or instructions detrimental to foes. The
information could be the "keys" to a web location or
the instructions to perform an operation. This is a simple,
well known, and well understood device, which is, nonetheless,
very general purpose and effective for going undetected (by
eluding search engines and defying translation engines).
There is no known effective way to harvest all these
occurrences, character recognition programs will work in the
simple instances (such as the upper right panel above), but not
where there is a distracting background (such as the lower left
panel above). There is a trick, though, and that is to search
on the content of the URL as some will foolishly put clues to the
hidden information there, such as "excel_seminar.htm"
or "seminar_date.jpg".
|
3. |
Hiding text in cursors:
This device puts messages into the cursor tag hidden to: a)
translation engines, b) search engines, c) casual viewers, and d)
those who don't place their cursor in the specific place. One
can also see how this device used with the text hiding (#1,
above) can effectively thwart the unintended viewer, that is,
the "foe".
Below, there is a long message in the original
"friend view" which is scrambled by the translation
engine effectively hiding the message from the viewer. By going
into the actual, original HTML code, we extracted the Japanese
message and then had just that message translated. That's how
we were able to expose the contents of the hidden cursor text.
|
|
|
To thwart this device, it is necessary to view the page
in the original language, upgrade translation engines to
translate cursor tags (which has not been done to date), or
examine the HTML code for the cursor tags and extract them for
separate examination.
|
4. |
Blocked or Dead Links:
This device is simple and crude, but effective. What is
done is to give the appearance of having a link, but when you
click on it, nothing happens. This is most often an innocent
file management error, but, as shown below, can seem quite
intentional. Usually, we can't show dead links (simply, there's
nothing to show, it's just that the link doesn't work).
|
|
|
Thwarting and detecting this device automatically is
easy in one instance and impossible (or seemingly so) in another.
In the case where the link is there but it doesn't work (usually
because the target URL is not at its specified location), web
crawlers regularly report these as "errors". In the
instance shown above, where the link is "suspiciously"
absent, there is no known automated method that draws attention
to the link's deviant property.
|
5. |
Miscellaneous Devices:
- Using aliases, misspellings, and/or abbreviations in
order to prevent search engines from finding the real
meaning and reference. Instead of "disclosure"
we found a phonetic substitution used, "disc rose".
This confuses the translation machine, but doee not confuse
the person who reads it phonetically (e.g. interchanging
"r" and "l" phonetically is prevalent
in Japan).
- Outputting different pages to different viewers (perhaps
based on the URL of the calling page) is a device we have
experienced. In this instance, a Japanese page existed in
Japan, but was unavailable in USA.
- Interchanging fonts and languages to confound translation
engines and search engines. For example, by interchanging
different but legible fonts (UTF, Shift JIS, and EUC),
translation machines become confused and output either
strange characters or bands of question marks where legible
text in the original language existed.
- Gaming Search Engines is a commonplace marketing strategy
with the aim of getting a higher ranking to appear earlier
in the list of "hits." However, when hiding, you
want the opposite, a lower ranking so that the "hit"
will appear later in the list and probably go unnoticed.
This is the goal of inserting repetitions into web documents
(i.e.., salting) that may be ranked by search engines.
Here is an example:
You can see from the translation below that the
salting of the page with meaningless "truth" adds
no value and is only intended to bury the page at the bottom
of any list of inquiry "hits." If the "foe"
ever got to the page, below is what the "foe" would see.
- Gaming Translation Engines. You may find
translation engines asking if you want to make a suggestion
to improve a translation. In this instance, we found an
accurate translation in November of 2006 which is as follows:
Now, this erroneous translation [of the market
size, that which follows the ":" in both the
Japanese and the English translation] is off by
two orders of magnitude, not to mention that the
"several" modifier, , has been
ignored, as well. The first number is 100, i.e., , followed by, not a million,
but a hundred million, . The result is that these
three characters are "several hundred hundred
million" or several ten billion, not the hundred
million erroneously reported by Google translation in
April. Today, Google, at our prodding, has corrected
this and you can test it for yourself. We make no
suggestion about who gamed Google's translation engine,
but we do note that that is a discovery hacking avenue
that those who want one view for friends and another for
foes could exploit.
|
|
|