Queue (queue) wrote in jotto,

Update of the Jotto program

So, the program now checks to make sure that the guesses are in its dictionary.

I've also got a utility to add words to the word list, making sure it doesn't add duplicates. I got a huge word list from the British National Corpus, which mattlistener showed me. It's a compilation of a bunch of words, sorted by frequency. I got Jay to install the Net::Dict Perl module so that I could use the dict.org server to help pare down the list of 5-letter words to those that are legal words, since the BNC list includes a bunch of 5-letter strings that are not words, like "aaaaa" and stuff, things that sometimes show up places. Anyway, I pared the list down and added it to my list. My word list had something like 3600, and now it has 5800. I'm not satisfied that all of those are necessarily valid words, though. The dict.org server returns things that aren't legal Scrabble words, since it pulls things from the Hitchcock Bible Names dictionary and the U.S. Gazzetteer (place names) and other sources. So I had it filter out some of those sources, but I'm sure there are other sources I have missed, that I need to add to my filter. So I'm going to go through the added words by hand, looking them up individually, until I find some more sources that I want to exclude.

The other issue with the dict.org server is that it doesn't do things like plurals, so it thinks that "bores" isn't a word, for example. dictionary.com, which appears to use the dict.org server, can handle alternate forms fine, so either there's another server out there, or dictionary.com does its own stuff for that. I'll have to do some more investigation, so that I don't accidentally leave out words.

My goal is to have a complete 5-letter dictionary. I suppose the next task will be to flag the more common words, so that the program can ask you if you want a more common word or if you'll allow it to pick from its whole list. It needs to have the whole list, though, so that it can handle any guess from someone, since it checks to make sure that each guess is a valid word.

So, filtering the dictionary is my current task, and then I'll move on to having the program guess my word. I have a feeling I won't start that for a while, though, since the current task will probably take me quite some time.
