Skip to main content
The comment you are replying to does not exist.

New algorithm for learning languages

Cornell University and Tel Aviv University researchers have developed a method for enabling a computer program to scan text in any of a number of languages, including English and Chinese, and autonomously and without previous information infer the underlying rules of grammar. The rules can then be used to generate new and meaningful sentences. The method also works for such data as sheet music or protein sequences.

The development -- which has a patent pending -- has implications for speech recognition and for other applications in natural language engineering, as well as for genomics and proteomics. It also offers new insights into language acquisition and psycholinguistics.

"The algorithm -- the computational method -- for language learning and processing that we have developed can take a body of text, abstract from it a collection of recurring patterns or rules and then generate new material," explained Shimon Edelman, a computer scientist who is a professor of psychology at Cornell and co-author of a new paper, "Unsupervised Learning of Natural Languages," published in the Proceedings of the National Academy of Sciences (PNAS, Vol. 102, No. 33).

"This is the first time an unsupervised algorithm is shown capable of learning complex syntax, generating grammatical new sentences and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics," he said.

Unlike previous attempts at developing computer algorithms for language learning, the new method, called Automatic Distillation of Structure (ADIOS), successfully identifies complex patterns in raw texts. The algorithm discovers the patterns by repeatedly aligning sentences and looking for overlapping parts.

For example, the sentences I would like to book a first-class flight to Chicago, I want to book a first-class flight to Boston and Book a first-class flight for me, please may give rise to the pattern book a first-class flight -- if this candidate pattern passes the novel statistical significance test that is the core of the algorithm.

If the system also encounters the sentences I need to book a direct flight from New York to Tel Aviv andI would like to book an economy flight , it may infer that the phrases first-class, direct and economy are equivalent in the context of the new pattern. "Because such equivalence sets can contain other patterns -- in turn containing further patterns, and so on -- the resulting body of knowledge grows recursively, as a sort of forest of branching trees of possibilities," said Edelman.

He added, "ADIOS relies on a statistical method for pattern extraction and on structured generalization -- two processes that have been implicated in language acquisition. Our experiments show that it can acquire intricate structures from raw data, including transcripts of parents' speech directed at 2- or 3-year-olds. This may eventually help researchers understand how children, who learn language in a similar item-by-item fashion and with very little supervision, eventually master the full complexities of their native tongue."

In addition to child-directed language, the algorithm has been tested on the full text of the Bible in several languages, on artificial context-free languages with thousands of rules and on musical notation. It also has been applied to biological data, such as nucleotide base pairs and amino acid sequences. In analyzing proteins, for example, the algorithm was able to extract from amino acid sequences patterns that were highly correlated with the functional properties of the proteins.

The new method was developed jointly with David Horn and Eytan Ruppin, professors of physics and computer science, respectively, at Tel Aviv University, and with Zach Solan, a doctoral student there and the lead author on the paper. Their collaboration with Edelman was supported in part by the U.S.-Israel Binational Science Foundation.

From Cornell University


August 31, 2005

Comments

Great

October 24, 2009 by Anonymous, 5 weeks 2 days ago
Comment: 45743

Really its wonderful to create such algo to understand any language .I appreciate your work.

Hope, yes

October 5, 2009 by Anonymous, 8 weeks 19 hours ago
Comment: 45234

Hope, yes

Madrid student

October 1, 2009 by Anonymous, 8 weeks 5 days ago
Comment: 45139

This article was of great information. I came across various rules and regulation which could later generate new and meaningful sentences in future.

business english course

September 29, 2009 by Anonymous, 9 weeks 10 min ago
Comment: 45085

First of all thank you for posting an article on this topic and secondly good luck on the quick start of new algorithm for learning languages. I wonder if you could discover more on business english course or check out the site http://www.british-study.com/adults/english-language-courses/business-english.php

I would like to book a

September 27, 2009 by Anonymous, 9 weeks 1 day ago
Comment: 45053

I would like to book a first-class flight to Chicago, I want to book a first-class flight to Boston and Book a first-class flight for me, please may give rise to the pattern book a first-class flight -- if this candidate pattern passes the novel statistical significance test that is the core of the algorithm.:)

Full Songs

Lida going Lida

September 25, 2009 by Anonymous, 9 weeks 3 days ago
Comment: 44996

Lida going Lida

Thanks

August 6, 2009 by Anonymous, 16 weeks 4 days ago
Comment: 42536

Thanks for sharing these info with us! I was reading something similar on another website that i was researching. I will be sure to look around more. thanks...

Thanks

August 2, 2009 by Anonymous, 17 weeks 2 days ago
Comment: 39292

First, thanks for new algorithm, then,how this algorithm is different? i will try..

~~~~~~~~
james
msn cams

Sheet music

July 15, 2009 by Anonymous, 19 weeks 5 days ago
Comment: 38037

I like the idea of creating meaningful, yet random sheet music.

k?zl?k bozma

June 8, 2009 by Anonymous, 25 weeks 20 hours ago
Comment: 37110

It's an old idea (if I don't miss something). If you are interested in this topic use Google and search for "Markov chains". http://www.xhamsterturk.com Spammers generate "like-real" tests using this algorithm. And yes, it also can be applied to test texts.

backup

May 27, 2009 by Anonymous, 26 weeks 5 days ago
Comment: 36874

I make backups http://rapid4me.com/?q=backup of my kids movies so that WHEN they trash them (I gave up on thinking “if”) I don’t have to spill another $30 because it was a Disney flick. I already bought Mulan and Toy Story twice.. I learned my lesson.

fun game

May 21, 2009 by Anonymous, 27 weeks 4 days ago
Comment: 36779

some Gaming is so fun,and i like thecosplay for mmo too

Thanks for sharing these info

May 18, 2009 by Anonymous, 28 weeks 4 hours ago
Comment: 36712

Thanks for sharing these info with us! I was reading something similar on another website that i was researching. I will be sure to look around more. thanks...

thank you

May 13, 2009 by Anonymous, 28 weeks 5 days ago
Comment: 36609

thanks

Yusuf Güney

r10.net genclik^^

May 11, 2009 by Anonymous, 29 weeks 12 hours ago
Comment: 36580

The method also works for such data as sheet music or protein sequences.

Excellently written article,

April 29, 2009 by Anonymous, 30 weeks 5 days ago
Comment: 36438

Excellently written article, if only all bloggers offered the same content as you, the internet would be a much better place. Please keep it up!
Cheers.

I study

April 21, 2009 by Anonymous, 31 weeks 6 days ago
Comment: 36306

I study Applied Linguistics in my university, Computational and other lingustics. My purpose is to.

Yeah!

April 16, 2009 by Anonymous, 32 weeks 4 days ago
Comment: 36186

Yeah, thats fantastic, let it be.

Yeah!

April 16, 2009 by Anonymous, 32 weeks 4 days ago
Comment: 36185

Yeah, thats fantastic, let it be.

Great

April 14, 2009 by Anonymous, 32 weeks 6 days ago
Comment: 36126

Yes, it's great feature for spam prevention development.

Sure

April 14, 2009 by Anonymous, 32 weeks 6 days ago
Comment: 36125

It's really fantastic.

Re: spam?

April 12, 2009 by Anonymous, 33 weeks 1 day ago
Comment: 36059

I heard about that algorithm too.

What is the difference?

March 25, 2009 by Anonymous, 35 weeks 5 days ago
Comment: 35592

Hello, how this algorithm is different?
Really can improve my study of any languaje know this?
Regards,
Vivi
http://www.deperfumes.com

Why

March 25, 2009 by Anonymous, 35 weeks 5 days ago
Comment: 35581

What do it ? video ?

1tk

March 25, 2009 by Anonymous, 35 weeks 5 days ago
Comment: 35580

Why spam a the real security http://www.1tk.org

)

March 23, 2009 by Anonymous, 36 weeks 13 hours ago
Comment: 35534

Is it being actually tested to work?
http://www.casininio.com

Wow that is a fantastic

March 1, 2009 by Anonymous, 39 weeks 1 day ago
Comment: 34946

Wow that is a fantastic development, it should be able to be modified to combat spam surely? play roulettepoker sitesbootleg movies

It's an old idea (if I don't

December 5, 2008 by Anonymous, 51 weeks 3 days ago
Comment: 33217

It's an old idea (if I don't miss something). If you are interested in this topic use Google and search for "Markov chains". Spammers generate "like-real" tests using this algorithm. And yes, it also can be applied to test texts.

mobile forum

Goof example

November 11, 2008 by Anonymous, 1 year 2 weeks ago
Comment: 32800

For example, the sentences I would like to book a first-class flight to Chicago, I want to book a first-class flight to Boston and Book a first-class flight for me, please may give rise to the pattern book a first-class flight -- if this candidate pattern passes the novel statistical significance test that is the core of the algorithm.:)

____
Aron

Thanks

September 29, 2008 by Anonymous, 1 year 8 weeks ago
Comment: 32181

Bro

September 27, 2008 by Anonymous, 1 year 9 weeks ago
Comment: 32167

there is always the problem

July 7, 2008 by Anonymous, 1 year 20 weeks ago
Comment: 30977

there is always the problem of 'how much is too much'? as you said it is applied on extensive and recurring raw data. where do you get this data from? i don't really care, but what about the validity of the data? in order to get a correct algorithm you need correct data.

Strict pattern-based methods

May 25, 2008 by Anonymous, 1 year 27 weeks ago
Comment: 30104

Strict pattern-based methods of grammar induction are often frustrated by the apparently inexhaustible variety of novel word combinations in large corpora. Statistical methods offer a possible solution by allowing frequent well-formed expressions to overwhelm the infrequent ungrammatical ones. They also have the desirable property of being able to construct robust grammars from positive instances alone. Unfortunately, the zero-frequency problem entails assigning a small probability to all possible word patterns, thus ungrammatical n-grams become as probable as unseen grammatical ones. Further, such grammars are unable to take advantage of inherent lexical properties that should allow infrequent words to inherit the syntactic properties of the class to which they belong.

Yes it's pity but it is

February 9, 2008 by Anonymous, 1 year 42 weeks ago
Comment: 27413

Yes it's pity but it is so:(
Search engines become smarter only because spa*mers getting smarter and invent such linguistic tools.

It's an old idea (if I don't

January 28, 2008 by Anonymous, 1 year 43 weeks ago
Comment: 27199

It's an old idea (if I don't miss something). If you are interested in this topic use Google and search for "Markov chains". Spammers generate "like-real" tests using this algorithm. And yes, it also can be applied to test texts.
--
Treat

We create program

February 24, 2006 by webmaster@prosoftone.net (not verified), 3 years 40 weeks ago
Comment: 1471

create program , that can easily generate texts with content, that answers exactly to human's questions. I think it'll be possible, when smth about background knowledge will be created.

I study

February 24, 2006 by Anonymous, 3 years 40 weeks ago
Comment: 1470

I study Applied Linguistics in my university, Computational and other lingustics. My purpose is to

Well this is interesting

January 29, 2006 by webmaster@fellowmate.com (not verified), 3 years 43 weeks ago
Comment: 1335

Well this is interesting I imagine the algorithm is in some version of C+ or something. There was once three or four competing lists of rules for English syntax, about 33 or so, so this is a good thing for computers to do, so to speak, sort of a real-time syntactical concordance analyses.

Syntax

September 3, 2005 by georgejmyersjr, 4 years 12 weeks ago
Comment: 1142

I once studied syntax and transformational grammar at Stony Brook University, along with other linguistic classes for Anthropology and find this an interesting development. When computer languages got started there was one SNOBOL which processed language, instead of numbers, which I thought might some day be developed, why even Bill Gates once promised a SNOBOL for Windows. (Where is it Mr. Gates?) Well this is interesting I imagine the algorithm is in some version of C+ or something. There was once three or four competing lists of rules for English syntax, about 33 or so, so this is a good thing for computers to do, so to speak, sort of a real-time syntactical concordance analyses. Bravo!

George J Myers, Jr. (my first post here, I'm awed)

I need a French/English

September 1, 2005 by trisha4, 4 years 13 weeks ago
Comment: 1138

I need a French/English version now please...before it's too late.

or even better, a spam

September 1, 2005 by antifraudster, 4 years 13 weeks ago
Comment: 1134

or even better, a spam generator!

-bugmenot.com-

spam?

August 31, 2005 by pyropunk, 4 years 13 weeks ago
Comment: 1133

Since spam often consists of randomly generated sentences could this algorithm be used as a spam filter?

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <sub> <blockquote> <br> <hspace> <img> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <width> <height> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options



About us

Science Blog was started in August 2002. It lives, breathes and eats press releases from research organizations around the globe. Most of what you read here are press releases from the outfits named in the stories themselves. Got a news story you think belongs here? Let's talk. The other half of the equation is blog posts from readers like you. So if you have an interest in science, please register and join others like you in an ongoing, vibrant dialog about what makes the world tick. Meantime, please take a minute to read our Privacy Policy and Site Disclaimer.


Premium Drupal Themes by Adaptivethemes