Pagina's in het onderwerp:   < [1 2 3 4]
How big of a threat is Google translate?
De persoon die dit onderwerp heeft geplaatst: Tim Drayton
Jeff Allen
Jeff Allen  Identity Verified
Frankrijk
Local time: 12:19
Meerdere talen
+ ...
SMT likes it when there isn't word/phrase chunking Feb 19, 2009

Tim Drayton wrote:
I have a hypothesis, which I articulated at the beginning of this thread, that purely statistical machine translation will be far more successful in languages with similar syntactic structures, such that words tend to bunch together in similar groups in both the source and target languages and any statistical matches will be highly significant. I will watch developments with interest, but I remain sceptical as to whether statistical machine translation will ever be able to provide satisfactory results between languages with very different structures such as English and Turkish. A detailed investigation of this topic goes well beyond the scope of a thread like this.


Tim, actually this is the main difference between Translation Memory (referred to as Example-based MT) and Statistical MT.
The phenomenon of chunking (lexical or phrase level) is what TMs thrive on, and if you set the threshold low, then you could use it to pull up the translated segments based on terminology chunking. but the side effect of setting the threshold too low in such circumstances is that can also end up with a lot of noise of translated parts of segments which don't correspond.

However, Statistical MT takes a different approach, and I've explained it with some examples at:
statistical MT approach + TMs
http://www.proz.com/forum/translator_resources/100328-machine_translation:_your_experience_with_t he_various_mt_programmes_state_of_play-page2.html#998639

SMT will of course be good on such chunking, but the benefit of it is more interesting and useful when due to the distance between the words (or rather better to call them character or symbol stings, since they are usually based on sequences of bigram and trigrams - 2 and 3 characters). SMT lets you capitalize on content that does not have the high level of matched chunks. And that's why a lot of big multinational corporations are interested in SMT. Not all of their content is perfectly aligned and with high chunking. It's very diverse.

It's much easier to explain this visually with a list of sentences and showing how the SMT system actually processes everything in parallel, not in a sentence-by-sentence sequential way.

Jeff


 
Pagina's in het onderwerp:   < [1 2 3 4]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How big of a threat is Google translate?







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »