True Phonetic Rhyming

There have been lots of changes lately to Rhymebrain.com. In the past few months, I have added the German, French, and Spanish languages.

RhymeBrain is one of only three rhyming dictionaries that I know about that do rhyming in the phonetic space. One is B-rhymes, which give delightfully different results from Rhymebrain. The other is the excellent German language rhyming dictionary, http://www.echtreim.de. Before this year, I was using the sounds of only the english language to do rhyming. Technically, I was first converting the words that you enter into ARPABET (http://en.wikipedia.org/wiki/Arpabet), and then looking for words with similar ARPABET encodings. I describe this technique in my blog on computer programming.

Now, using new techniques in machine learning, which are only a few years old, I am able to convert the words that you enter into the International Phonetic Alphabet. If you open up a dictionary and look at the pronunciation, the phonetic alphabet is the strange symbols that they use to show you how to pronounce a word. For example: /fəˈnɛtɪks/.

You can now see exactly how rhymebrain.com thinks a word is pronounced using the pronunciation tool.

The international Phonetic Alphabet is designed to show how to pronounce any of the world’s languages. And so, it is now possible for rhymebrain to do true phonetic rhymes any language. So far, I have added only a few, but I hope to add more in the future. It is challenging, because I don’t know any other languages. I took some French in high school, which is a big help, but other than that I have no idea if the rhymes are correct. That is why I am rolling out features very slowly.

I am constantly tweaking how rhymes are calculated. Since I have spent so much time learning about phonetics, I have made the rules more consistent. For example, a B sounds similar to a D, and so they are considered equivalent for near rhymes. These rules apply no matter what language you are speaking. In a sense, rhyming is a universal language that transcends international boundaries.

Advertising

The other big change is that there is now advertising on rhymebrain.com. I hate Internet ads, and I encourage you to block them using AdBlock Plus (search on Google for this if you don’t know what it is). However, I must put ads on the site. The reason is because I love to spend time on my hobby, creating rhymebrain.com, but it is easier to justify this time away from my family if it is producing some revenue. Currently the ads, from the 60,0000 weekly visitors, produce about $130 per week. So far, I have purchased some additional computer RAM with this money, which allowed me to process and add the additional languages.

I think that Google, being the face of the Internet (when you’re done facebooking), is a major trendsetter in web design. My web site does similar things. It is search web site,  so it makes sense that RhymeBrain is similar to other search engines.

But I have been having problems. With each new feature I added, I faced the problem of where to put it. The front page of RhymeBrain got uglier.

Some time ago, Google redesigned their search page to look more like Microsoft’s Bing. By copying this design, I get a look that is familiar to users, as well as a nice column along the left to add buttons and new features.

I think it’s important to use all of the horizontal space on today’s screens to show words, and the new results page preserves that. There is no need to confine everything to a small column in the center, as most web sites do.

What happened to insults?

The insult feature would find rhymes that are also adjectives or nouns, and paired them with a name that you entered (eg. Gary the fairy). I was hoping this would lead to more links to the web site, but it wasn’t, so it’s gone. Sorry! You’ll have to do this manually from now on.

Back in January, RhymeBrain was barely getting 400 visitors a week. Now that number is 12,000. What happened?

It is because of Google. When people want to rhyme a word, they don’t enter rhymezone.com or rhymer.com into their browser. They type “WHAT RIMES WITH BLAH” into Google, and that’s how they get to Rhymebrain.com, or any one of the other five or six big rhyming dictionaries. Very few users will even remember the name of the dictionary they ended up on.

Once I realized that the main entry point is Google, I got busy optimizing the web site for Google visitors. I made RhymeBrain’s wordlist available for searching at http://rhymebrain.com/browse.html in late 2009, and since that time, Google has been working non-stop to index 260,000 words (now 2.6 million). I’ve enhanced the description of the results pages that appear in search engines, to give part of the answer. For example:

Now you don’t even need to go to RhymeBrain.com to get your results!

In December, I got to work on enhancing rhyme quality. A reviewer at Paradise Tossed was put off by the amount of scientific jargon that appears in results. Now, the most common words appear higher on the list, so the words most useful for lyrics appear first. The number of words listed on a page is reduced from 500 to 25, so that only the best words appear, but users have an option to increase that.

I also worked on the RhymeRank™ algorithm, which calculates how much two words rhyme. Now, time and line will rhyme just fine.

What’s next for 2011?

Now that the quality issue is out of the way, I plan to innovate in other areas. The thesaurus that I use for the alliteration tool isn’t very good. An accurate thesaurus would also help enhance rhyme search results.

December also saw the very first version of the RhymeBrain editor tool, a place where you can write your song lyrics and have a constantly updated view of different words that rhyme. Keep your eye out for more functions there as well.

 

I had some free time on my Christmas break, so was playing with some rhyming tools.  One of them is VersePerfect, which lets you write poetry or verse, and while you are typing you see a list of rhyming words, definitions, and synonyms updated at the side of the screen. It’s a neat tool, but I though it could be even better with the power of RhymeBrain’s 2.6 million word dictionary.

Use RhymeBrain writer, right now, right in your web browser. While you are typing, it shows you potential rhymes in real time. I did have a thesaurus feature, but it wasn’t any good. Soon I hope to include a good thesaurus, but it will depend on when I get some more free time.

I developed rhymebrain.com for fun, in my spare time, and I’m the first to admit that it isn’t the greatest rhyming dictionary out there. Human language is a difficult problem for computers to understand, and lots of other people have tackled this subject. Here are some of my favourite rhyming dictionaries from the web.

RhymeZone

Rhymezone, made by Doug Beeferman is the oldest and one of the most accurate rhyming dictionaries out there. I remember using it when I was in high school, and years later, I used it to write love poems to my girlfriend, who is now my wife.

Doug has spent his life studying how computers can process human language, and has spent years building up a comprehensive dictionary of English.

Like RhymeBrain, the dictionary also suggests near rhymes when no perfect rhymes are available.

Rhymer.com

Rhymer, from WriteExpress(R) is another great web site. It has 93,000 words, and not only lets you search for rhymes, but it also has beginning-rhymes, or lets you rhyme only the last, last two, or last three syllables at a time. For $19.99 you can also download their software and lookup rhymes without connecting to the Internet.

B-Rhymes

B-Rhymes also allows you to see “slant” rhymes, which are words that sound similar, like “conclusive” and “effusive”, but they don’t technically rhymes. The listing of rhyming words also shows you how the words are pronounced. The author has an iphone app for sale.

 

I’ve finished my analysis of Google Books N-grams raw data and incorporated 2.6 million words into RhymeBrain. This is an increase of 10 times.  (RhymeBrain Word List is here)

Most of the words are OCR garbage, so it forced me to come up with a better algorithm for eliminating garbage words. With the Google data, for any given word (even “orange”) the algorithm comes up with thousands of words. Now, the list is whittled down into 25 or so by taking into account both RhymeRank(TM) and log(frequency). The user can click on a button to load up to 400 results.

There is a trade off. I collapsed the historical Google Books data from all years. Perfectly legitimate Words like “shutterbug” then have a very low frequency, since they were recently invented.

On the implementation side, the word tree grew to 90 MB which is too much to be loading in for each query. Now the tree is mapped into memory using the mmap() system call, resulting in average response times of 60 ms on my sock-drawer data center, where rhymebrain.com is hosted:


A minor tweak is to add “Consider using these near-rhymes or slant-rhymes” to the result pages. This is hip-hop jargon, as I learned from the B-Rhymes blog: http://www.b-rhymes.com/2010/01/slant-rhymes-or-near-rhymes/

Do you have an Android handheld device? Some time ago, my online friend, Mohamed Mansour, created a rhymebrain app for Android. It is much easier to use than the web site on mobile devices.

I have been meaning to link to it for a while now, so here it is:

http://www.m0interactive.com/archives/2010/09/12/just_released_rhymebrain_on_android.html

Apparently, you can download it by taking a photo of this weird thingy:

Google released some research data that they have been using:  It is a list of all of the words in all of the books that they have scanned.  See them here: http://ngrams.googlelabs.com/

I am constantly faced with a major problem on Rhymebrain: Since I get the results from importing text from all over the web, many words in rhymebrain are not really words, and it is filled with spelling mistakes.

There is no standard dictionary that contains all of the words of English. For one thing, people make up words all the time. They verbify nouns, and noun-ify verbs. Can you pinkify your wardrobe? Sure you can! If a signer signs a document, what does the document’s signee do? I have no idea, but people have used this word thousands of times in the last hundred years.

Google’s data has problems too. In particular, it is filled with errors from the scanning process. For example, the word cr6dit appears very often, because the letter e on a printed page sometimes looks like a 6 to dumb computer software that doesn’t know any better.

There is a bright side:  The Google data has 3 billion words and a count of how many times they occur.  Maybe the misspelled words will be eclipsed by the correct ones.

Right now, I am running the data through a program that I wrote to try to figure out if I can use it to enhance the Rhymebrain results.

RhymeBrain gets most of the hits from users who enter “what rhymes with <blah>” into Google. Other dictionaries only contain the roots of words; but rhymebrain sounds out the word as a human would, so it gets a lot of traffic from weird endings like “believABLE” and “hippopotamusES”.

In a flash of inspiration while going over some analytics, I realized that people are clicking on my site when the small snippet of text displayed on Google actually contains results like the ones they are looking for. Many of my pages had this out of pure luck, but others had random headings from other parts of the page.

So now my <meta description> for the search spider bait pages contains the first few answers. Example:

http://rhymebrain.com/r/What_rhymes_with_chainsaw.html has:
<meta name=”description” content=”Rhymes with chainsaw: arkansas claybaugh laidlaw rakestraw slaybaugh … [311 more]”>

In addition, the meta tag of the main page has a call to action:

“Go ahead: try to stump it. RhymeBrain works for the toughest words, when other web sites fail.”

 

Rhymegeist!

What words do people rhyme? The 25 most sought after rhymes for 2010, on rhymebrain.com, were:

  1. eleven
  2. orange
  3. me
  4. you
  5. genius
  6. love
  7. blizzard
  8. time
  9. up
  10. life
  11. forgotten
  12. future
  13. live
  14. incredible
  15. optimism
  16. it
  17. day
  18. faith
  19. purple
  20. silver
  21. world
  22. shows
  23. pussy
  24. out
  25. feudalism

Mike over at B-Rhymes has made his own, more comprehensive list and it’s worth checking out for the commentary he added.