Corazones de Alcachofa

Text Mining

Text mining, sometimes alternately referred to as text data mining, refers generally to the process of deriving high quality information from text. High quality information is typically derived through the dividing of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).

The difference between regular data mining and text mining is that in text mining the patterns are extracted from natural language text rather than from structured databases of facts. Databases are designed for programs to process automatically; text is written for people to read. We do not have programs that can “read” text and will not have such for the forseeable future. Many researchers think it will require a full simulation of how the mind works before we can write programs that read the way people do.

However, there is a field called computational linguistics (also known as natural language processing) which is making a lot of progress in doing small subtasks in text analysis. For example, it is relatively easy to write a program to extract phrases from an article or book that, when shown to a human reader, seem to summarize its contents. (The most frequent words and phrases in this article, minus the really common words like “the” are: text mining, information, programs, and example, which is not a bad five-word summary of its contents.)

In text mining, the goal is to discover heretofore unknown information, something that no one yet knows and so could not have yet written down.

People are using the output of such programs to try to link together information in interesting ways. For example, one can extract all the names of people and companies that occur in news text surrounding the topic of wireless technology to try to infer who the players are in that field. There are a number of companies that are investigating this kind of application.

One problem with these approaches is that it is difficult to recognize which of the many relations that are shown are truly interesting. You’ll immediately see who the big players are, but anyone who knows the business will already be aware of this. You’ll also see many, many weak links between various players, hundreds or thousands of such links, and you can’t tell which are the really interesting ones that you should pay attention to.

 Retrieved abr-28-2008

*Marti Hearst, ACL’99, http://people.ischool.berkeley.edu/~hearst/text-mining.html

*Text Mining-Wikipedia, http://en.wikipedia.org/wiki/Text_mining

abril 28, 2008 Posted by | Human Language Technologies | | Deja un comentario

Computational Linguistics

    As Uszkoreit says, computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components. Theoretical CL takes up issues in theoretical linguistics and cognitive science. It deals with formal theories about the linguistic knowledge that a human needs for generating and understanding language. Today these theories have reached a degree of complexity that can only be managed by employing computers. Computational linguists develop formal models simulating aspects of the human language faculty and implement them as computer programmes. These programmes constitute the basis for the evaluation and further development of the theories. In addition to linguistic theories, findings from cognitive psychology play a major role in simulating linguistic competence. Within psychology, it is mainly the area of psycholinguisticsthat examines the cognitive processes constituting human language use. The relevance of computational modelling for psycholinguistic research is reflected in the emergence of a new subdiscipline: computational psycholinguistics.The rapid growth of the Internet/WWW and the emergence of the information society poses exciting new challenges to language technology.  Although the new media combine text, graphics, sound and movies, the whole world of multimedia information can only be structured, indexed and navigated through language. For browsing, navigating, filtering and processing the information on the web, we need software that can get at the contents of documents. Language technology for content management is a necessary precondition for turning the wealth of digital information into collective knowledge. The increasing multilinguality of the web constitutes an additional challenge for our discipline. The global web can only be mastered with the help of multilingual tools for indexing and navigating. Systems for crosslingual information and knowledge management will surmount language barriers for e-commerce, education and international cooperation.Computational linguistics as a field predates artificial intelligence, a field under which it is often grouped. Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since computers had proven their ability to do arithmetic much faster and more accurately than humans, it was thought to be only a short matter of time before the technical details could be taken care of that would allow them the same remarkable capacity to process language.
    Retrieved: 28-04-2008        

abril 28, 2008 Posted by | Human Language Technologies | | Deja un comentario

European Research Centres

The European Research Center for Information Systems (ERCIS) was founded in 2004 at the University of Münster in Münster, North Rhine-Westphalia, Germany. The objective of ERCIS is connecting research in Information systems with Business, Computer Science, Communication Sciences, Law, Management and Mathematics. The ERCIS consists of leading national and international universities and companies in the field of Information Systems.

The ERCIS – European Research Center for Information Systems – is a network of scientists that conduct cooperative research in the field of integrated information systems development and organizational design. For the first time, core competencies in the discipline of information systems are interrelated with issues in the field of computer science, business administration and specific legal issues within an institutional framework. Thus, a holistic view of information system development and organizational design issues can be ensured.

Due to its outstanding reputation in both research and teaching within the field of information systems and business administration, the University of Münster has been selected by the federal state of North Rhine-Westphalia to found the European Research Center for Information Systems. Its objective is to undertake joint research projects that span different disciplines and countries, thus fostering research at a level that cannot be achieved by individual go-it-alone projects. The exchange of researchers, such as PhD students, lecturers or (associate) professors, is encouraged and cooperative masters and doctoral programs are also part of the overall objective.

The National Centre for Language Technology (NCLT) ”conducts research into the processing of human language by computers, such as speech recognition and synthesis, machine translation, human-computer interfaces, information retrieval and extraction, the teaching and learning of languages using computers and software localisation and globalisation. Research in Human Language Technology (HLT) is interdisciplinary and includes Natural Language Processing (NLP) and Computational Linguistics (CL). HLT has substantial economic implications and potential. The centre carries out basic research and develops applications.”

OFAI Language Technology Group: “Language Technology (LT) forms a major research area at the Austrian Research Institute for Artificial Intelligence (OFAI) since its inception in 1984. We conduct research in modelling and processing human languages, especially for German. This includes constructing linguistic resources (such as lexicons, grammars, discourse models), processing algorithms (such as morphological components, parsers, generators, speech synthesizers, discourse processing components), and application prototypes (such as natural language interfaces, advisory systems and concept-to-speech systems).”Retrieved: Abr 21 2008, 13:00

Retrieved on the 21st of April 2008 

*http://en.wikipedia.org/wiki/European_Research_Center_for_Information_Systems

*http://www.ercis.de/ERCIS/en/index.php

abril 21, 2008 Posted by | Human Language Technologies | | Deja un comentario

Natural Language Processing

The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.

This goal is not easy to reach. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

The value to our society of being able to communicate with computers in everyday “natural” language cannot be overstated. Imagine asking your computer “Does this candidate have a good record on the environment?” or “When is the next televised National League baseball game?” Or being able to tell your PC “Please format my homework the way my English professor likes it.” Commercial products can already do some of these things, and AI scientists expect many more in the next decade. One goal of AI work in natural language is to enable communication between people and computers without resorting to memorization of complex commands and procedures. Automatic translation—enabling scientists, business people and just plain folks to interact easily with people around the world—is another goal. Both are just part of the broad field of AI and natural language, along with the cognitive science aspect of using computers to study how humans understand language.

In theory, natural-language processing is a very attractive method of human-computer interaction. Early systems such as SHRDLU, working in restricted “blocks worlds” with restricted vocabularies, worked extremely well, leading researchers to excessive optimism, which was soon lost when the systems were extended to more realistic situations with real-world ambiguity and complexity.

Natural-language understanding is sometimes referred to as an AI-complete problem, because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it. The definition of “understanding” is one of the major problems in natural-language processing.

Retrieved 14 Abr, 2008 – 12:51

http://en.wikipedia.org/wiki/Natural_language_processing

http://www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/NaturalLanguage

http://research.microsoft.com/nlp/

abril 14, 2008 Posted by | Human Language Technologies | | Deja un comentario

Computational Linguistics

    Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.Theoretical CL takes up issues in theoretical linguistics and cognitive science. It deals with formal theories about the linguistic knowledge that a human needs for generating and understanding language. Today these theories have reached a degree of complexity that can only be managed by employing computers. Computational linguists develop formal models simulating aspects of the human language faculty and implement them as computer programmes. These programmes constitute the basis for the evaluation and further development of the theories. In addition to linguistic theories, findings from cognitive psychology play a major role in simulating linguistic competence. Within psychology, it is mainly the area of psycholinguisticsthat examines the cognitive processes constituting human language use. The relevance of computational modelling for psycholinguistic research is reflected in the emergence of a new subdiscipline: computational psycholinguistics.Computational Linguistics is the only publication devoted exclusively to the design and analysis of natural language processing systems. From this unique quarterly, university and industry linguists, computational linguists, artificial intelligence (AI) investigators, cognitive scientists, speech specialists, and philosophers get information about computational aspects of research on language, linguistics, and the psychology of language processing and performance.
    Computational Linguistics (CLI) is the scientific study of language from a computational perspective. It is an interdisciplinary field which draws on linguistic theory (phonology, syntax, semantics, pragmatics) and computer science (artificial intelligence, theory of computation, programming methods), as well as, to a lesser extent, other disciplines such as philosophy, cognitive science, and psychology. CLI is a lively and intellectually vital scientific discipline, generating advances that shed new insight on models of human linguistic abilities, as well as creating opportunities for practical tools that can be of tremendous benefit to society. Established in connection with Georgetown’s pioneering research in the 1950’s in Russian-English machine translation, Computational Linguistics at Georgetown offers a novel, theory-based, technology- aware program of study which prepares students for teaching and research in linguistics and computational disciplines as well as research and development positions in the growing field of information technology. Retrieved Abr 14, 2008, 12:42

abril 14, 2008 Posted by | Human Language Technologies | | Deja un comentario

Human Language Technologies

The field of human language technology covers a broad range of activities with
the eventual goal of enabling people to communicate with machines using natural
communication skills. Research and development activities include the
coding, recognition, interpretation, translation, and generation of language.
The study of human language technology is a multidisciplinary enterprise,
requiring expertise in areas of linguistics, psychology, engineering and computer
science. Creating machines that will interact with people in a graceful and
natural way using language requires a deep understanding of the acoustic and
symbolic structure of language (the domain of linguistics), and the mechanisms
and strategies that people use to communicate with each other (the domain of
psychology). Given the remarkable ability of people to converse under adverse
conditions, such as noisy social gatherings or band-limited communication channels,
advances in signal processing are essential to produce robust systems (the
domain of electrical engineering). Advances in computer science are needed to
create the architectures and platforms needed to represent and utilize all of this
knowledge. Collaboration among researchers in each of these areas is needed to
create multimodal and multimedia systems that combine speech, facial cues and
gestures both to improve language understanding and to produce more natural
and intelligible speech by animated characters.

Human language technologies play a key role in the age of information.
Today, the benefits of information and services on computer networks are unavailable
to those without access to computers or the skills to use them. As
the importance of interactive networks increases in commerce and daily life,
those who do not have access to computers or the skills to use them are further
handicapped from becoming productive members of society.

One of the most relevant character in this area is Uszkoreit who studied Linguistics and Computer Science at the Technical University of Berlin from 1973 to 1977 and the University of Texas at Austin from 1977 to 1981. During this time he also worked as a research associate in a large machine translation project at the Linguistics Research Center. He received the Ph. D. (Doctor in Philosophy) in linguistics from University of Texas in 1984. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. During this time he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In 1986 he spent six months in Stuttgart on an IBM (International Business Machines Corporation) Research Fellowship at the Science Division of IBM Germany. In December 1986 he returned to Sttutgart to work for IBM Germany as a project leader in the project LILOG (Linguistic and Logical Methods for the Understanding of German Texts). During this time, he also taught at the University of Stuttgart.
Among all his relevant publications and projects we can quote here some of them:

* Uszkoreit, H. (2007) Methods and Applications for Relation Detection. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing, 2007.
* Uszkoreit, H., F. Xu, J. Steffen and I. Aslan (2006) The pragmatic combination of different cross-lingual resources for multilingual information services In Proceedings of LREC 2006, Genova, Italy, May, 2006.
* Uszkoreit, H. (2000): Sprache und Sprachtechnologie bei der Strukturierung digitalen Wissens. In: W. Kallmeyer (Ed.) Sprache in neuen Medien, Institut für Deutsche Sprache, Jahrbuch 1999, De Gruyter, Berlin.
* Uszkoreit, H. (1999): Sprachtechnologie für die Wissensgesellschaft: Herausforderungen und Chancen für die Computerlinguistik und die theoretische Sprachwissenschaft. In: F. Meyer-Krahmer und S. Lange (Eds.), Geisteswissenschaften und Innovationen, Physica Verlag.
* Uszkoreit, H. (1998): Cross-Lingual Information Retrieval: From Naive Concepts to Realistic Applications. In: Language Technology in Multimedia Information Retrieval, Proceedings of the14th Twente Workshop on Language Technology.

http://www.dfki.de/~hansu/HLT-Survey.pdf

http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html

Retrieved 11:50, 02-Abr-2008

abril 2, 2008 Posted by | Human Language Technologies | | Deja un comentario