Automatic Machine Translation & Natural Language Processing

From lotico
Jump to navigation Jump to search

Location: KONA, 60 W 23RD Street Conference Room, 4 FL.

Chapter: New York City

Date: February 21, 2008 6.30pm

Event ID: 7214637

URL: https://www.meetup.com/semweb-25/events/7214637/

An Introduction to ANTLR

Andy Tripp
Jazillian
http://www.jazillian.com
http://www.antlr.org

Automatically Linking Structured and Unstructured Data: Connecting Databases to Text (slides)
Breck Baldwin, PhD
President Alias-i Inc
http://www.alias-i.com

6:30 Networking 7:00 An Introduction to ANTLR 7:45 Automatically Linking Structured and Unstructured Data 8:30 Open Discussion


An Introduction to ANTLR Andy Tripp President, Jazillian Inc http://www.jazillian....

Automatic machine translation and language parsing. ANTLR, Another Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages. The SPARQL query language grammar for ANTLR v3 was recently updated to version 1.1 and provides an implementation of the W3C SPARQL grammar specification.

http://www.antlr.org/...

Automatically Linking Structured and Unstructured Data: Connecting Databases to Text Breck Baldwin President, Alias-i Inc http://www.alias-i.co...

Natural language processing for text analytics, text data mining and search. LingPipe is a state-of-the-art suite of natural language processing tools written in Java that performs tokenization, sentence detection, named entity detection, coreference resolution, classification, clustering, part-of-speech tagging, general chunking, fuzzy dictionary matching. These general tools support a range of applications.

Breck will discuss the thorny problem of linking entities in a database to text mentions of those entities. The challenges are:

- The John Smith problem: You have a text mention of "John Smith" and many possible John Smiths in the database. How to pick?

- The name variant problem: Your database has an incomplete list of aliases for a gene. Serpina3 has the alias 'ACT', but is also called 'AACT' in the literature but you don't know that.

- The new entity problem: You want to discover new performers when they show up in your entertainment text sources. Those new performers are not in your database yet, how is that handled?

Breck will discuss how you can approach these problems using the LingPipe suite of tools in context of entertainment news and bioinformatics.