Corazones de Alcachofa

Machine Translation

Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Current machine translation software often allows for customisation by domain or profession (such as weather reports) — improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used “as is”. However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language.

Machine translation can use a method based on linguistic rules, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language.
The Machine Translation (MT) project at Microsoft Research is focused on creating MT systems and technologies that cater to the multitude of translation scenarios today. Data driven systems, in particular those with a statistical core engine, have proven to be the most efficient, due to their ability to adapt to a wide domain coverage and being trained in new language pairs within a matter of weeks. This team works closely with research and development partners worldwide,  making the system accessible to a variety of products and services.
The field of machine translation has changed remarkably little since its earliest days in the fifties. The issues that divided researchers then remain the principal bones of contention today. The first of these concerns the distinction between that so-called interlingual and the transfer approach to the problem. The second concerns the relative importance of linguistic matters as opposed to common sense and general knowledge. The only major new lines of investigation that have emerged in recent years have involved the use of existing translations as a prime source of information for the production of new ones. One form that this takes is that of example-based machine translation in which a system of otherwise fairly conventional design is able to refer to a collection of existing translations.

Retrieved on the 11th of May 2008

*Microsoft Corporation-Microsoft Research:*Martin

Palo Alto Research Center, Palo Alto, California:

*Wikipedia-The free enyclopedia, John Hutchins:

mayo 11, 2008 Posted by | Human Language Technologies | | Deja un comentario

Natural Language

In the philosophy of language, a natural language (or ordinary language) is a language that is spoken, written, or signed by humans for general-purpose communication, as distinguished from formal languages (such as computer-programming languages or the “languages” used in the study of formal logic, especially mathematical logic) and from constructed languages.

Though the exact definition is debatable, natural language is often contrasted with artificial or constructed languages such as Esperanto, Latino Sexione, and Occidental.

Linguists have an incomplete understanding of all aspects of the rules underlying natural languages, and these rules are therefore objects of study. The understanding of natural languages reveals much about not only how language works (in terms of syntax, semantics, phonetics, phonology, etc), but also about how the human mind and the human brain process language. In linguistic terms, ‘natural language’ only applies to a language that has evolved naturally, and the study of natural language primarily involves native (first language) speakers.

The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.

This goal is not easy to reach. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

There are several major reasons why natural language understanding is a difficult problem. They include:


  1. The complexity of the target representation into which the matching is being done. Extracting meaningful information often requires the use of additional knowledge.
  2. The type of mapping: one-to-one, many-to-one, one-to-many, or many-to-many. One-to-many mappings require a great deal of domain knowledge beyond the input to make the correct choice among target representations. So for example, the word tall in the phrase “a tall giraffe” has a different meaning than in “a tall poodle.” English requires many-to-many mappings.
  3. The level of interaction of the components of the source representation. In many natural language sentences, changing a single word can alter the interpretation of the entire structure. As the number of interactions increases, so does the complexity of the mapping.
  4. The presence of noise in the input to the understander. We rarely listen to one another against a silent background. Thus speech recognition is a necessary precursor to speech understanding.
  5. The modifier attachment problem. (This arises because sentences aren’t inherently hierarchical, I’d say — POD.) The sentence Give me all the employees in a division making more than $50,000 doesn’t make it clear whether the speaker wants all employees making more than $50,000, or only those in divisions making more than $50,000.
  6. The quantifier scoping problem. Words such as “the,” “each,” or “what” can have several readings.
  7. Elliptical utterances. The interpretation of a query may depend on previous queries and their interpretations. E.g., asking Who is the manager of the automobile division and then saying, of aircraft?.

Natural Language Processing. Retrieved: 05-05-2008




mayo 5, 2008 Posted by | Human Language Technologies | | Deja un comentario