What is Neural Machine Translation – A (very) general overview

Descripción: C:UsersSaschaAppDataLocalMicrosoftWindowsINetCacheContent.WordiStock-657783290.jpg

What is Neural Machine Translation – A (very) general overview
Corporations like Google, Microsoft and Yandex are going neural

Yes, you read correctly, industry leaders such as Google, Microsoft and Yandex are now using neural machine translations. Neural machine translation (NMT) is a process of machine translation that uses a larger neural network, changing the phrase based statistical approach that used separately engineered subcomponents to a much larger network.

The results that Neural Machine Translation creates are amazing. When you look into the neural networks paraphrasing it is almost as the neural network understands the sentence to translate.

Google´s research team considers their neural machine translation system as bridging the gap between Human and Machine Translation, making it an end-to-end learning approach for automated translation, that they believe will overcome many weaknesses that the phrase based statistical approach had.

In test runs Google´s NMT system achieved a rate of 60 % less translation errors compared to its competing statistical approach based system.

We are not yet in the Star Trek sphere, where a translation device run through a computer can interpret and translate texts easily, finding algorithms in new languages to even be able to decipher them, but the progress is essential.

But how are these systems doing what they are doing?

A preliminary step to understand the MT itself would be to look into the Vauquois Triangle. There are different MT approaches available, as we mentioned earlier already, and their dissimilarities can be obviously observed through the Vauquois Triangle, which illustrates these levels of analysis.

Some models use a layered approach within a network, having several encoders and decoders that use attention and residual connections. So basically, they have an attention layer that connects the bottom layer of the decoder with the top layer of the encoder, to improve the system and decrease training time.

The definition of encoders means an analysis, and the result of the analysis is a sequence of vectors. The definition of decoders refers to a transfer, this generates directly the garget form without any generation phase (keep in mind this is not a strict constraint and represents only how the baseline technology works.

The process consists of mainly two separate phases:

Phase 1: each word of the source text is put through the “encoder” and generates a source context.

Phase 2: once the source context is generated, the target words are generated by applying different approaches. This is called the decoder.

Let´s look into both phases separately to understand this a bit better!
The NMT system generates different source contexts, that we can represent here with Context Source 1, Context Source 2, … Context Source 10) for each word within the sentence. We learnt that this is a sequence of float numbers associated with each source sentence. What we didn´t know yet is that each source word is typically associated with 1,000 float numbers.

The first step of the encoder is to look up each source word in a word embedding table.

We assume that the context nearest to each other should be the most similar, and as many different variations exist, there should be dimensions so that any given point can represent their similarity. To clarify this this a bit, imaging that the part of speech is one dimension, their gender another dimension, etc. Keeping this in mind, there can be hundreds of dimensions for each sentence possible.

This is put very simple what encoding actually does.

In a second phase the source context is generated, and by doing so the target words are generated. This is a rather complex process that consists of numerous components. The decoder generates target words using:

Target context that is generated with the previous word (this represents some information about the status of the translation)
A weighted source context: by the attention model, that applies a mix of different source contexts (side note: different attention models exist)
The previously translated word, having the word embedded so that the word is converted into a vector and the decoder can process this

The translation will be done, as soon as the decoder decides to end the translation with a special word. This sounds a little like magic, but there is a lot of science involved and studies about this have been published by Stanford and other renowned universities.

We will be presenting several follow up articles about this topic, as we just enjoy understanding the technology behind NMT and also will give you some idea about other MT systems out there.

Source & Follow-Up:

Effective Approaches to Attention-based Neural Machine Translation, by Minh-Thang Luong Hieu Pham Christopher D. Manning, Computer Science Department, Stanford University, Stanford

OpenNMT: an industrial-strength, open-source (MIT) neural machine translation system utilizing the Torch/PyTorch mathematical toolkit.

Google Research.

Descripción: Descripción: iStock-667849954.jpg

If you wish to receive a detailed offer on or consultation on products and services provided by Wagner Consulting or learn about our IT Division, then please feel free to contact our Sales Team, who will gladly assist you.

Sales Team at Wagner Consulting
Phone:
US: (718) 838 9533 (english speaking)
US: (917) 725 3145 (spanish speaking)
EU: (718) 618 4268
E-Mail: This email address is being protected from spambots. You need JavaScript enabled to view it.