tlhIngan-Hol Archive: Wed Aug 03 21:08:19 1994
[Date Prev][Date Next][Thread Prev][Thread Next]
Re: Scrabble Letter Frequencies
- From: [email protected] (Matt Whiteacre)
- Subject: Re: Scrabble Letter Frequencies
- Date: Thu, 4 Aug 94 06:04:41 PDT
>
>Do we really want letter frequencies in written text, or just
>in a word list? In scrabble, you aren't writing sentences. You
>are just writing words. If you take letter frequencies in any
>single person's text, you are getting the frequencies of words
>that person happens to think of most often. If you do it to the
>whole dictionary, then you get the frequencies of letters in
>the common vocabulary.
>
>charghwI'
>
>
I'm not sure what frequencies you would want. If you use just the word
list, and omit the affixes then I think you would get a distorted
distribution. Especailly since the suffix are uch more common than the
words. As to the problem that a distribution based on one persons writing
will skew the entire distribution based on that persons personal bent, I use
writtings by at least 5 different people, and my distribution is very
simmiliar to Nick's (although I'll have to admit the probably 75% of the
text on the FTP server is his:)). Comparing the distributions I got and
those of Mark Reed, with a column on the far right showing percent
difference, you can see that there are some substantial differences (chart
at the end on the message).
Either is a workable distribution, but they are very different. How does
Scrabble get thier distribution? Is it based on text or on word lists? I
don't know, but I would favor using text, simply because I feel it better
represents the frequency or words WITH affixes attached.
Now for the table:
Text based dist. Word list based dist.
' 15978 10.90% ' 411 2.06% 136.34%
a 16585 11.32% a 1865 9.36% 18.89%
b 3985 2.72% b 638 3.20% 16.34%
ch 3557 2.43% ch 237 1.19% 68.41%
D 5014 3.42% D 201 1.01% 108.89%
e 9357 6.38% e 2631 13.21% 69.66%
gh 4915 3.35% gh 245 1.23% 92.66%
H 8558 5.84% H 294 1.48% 119.29%
I 8654 5.90% I 351 1.76% 108.06%
j 7541 5.15% j 253 1.27% 120.81%
l 4183 2.85% l 976 4.90% 52.77%
m 5306 3.62% m 728 3.65% 0.95%
n 3472 2.37% n 2044 10.26% 124.98%
ng 1153 0.79% ng 232 1.16% 38.74%
o 9250 6.31% o 1643 8.25% 26.61%
p 2951 2.01% p 733 3.68% 58.54%
q 4089 2.79% q 203 1.02% 92.98%
Q 2142 1.46% Q 158 0.79% 59.28%
r 2288 1.56% r 1426 7.16% 128.39%
S 4260 2.91% S 218 1.09% 90.59%
t 3758 2.56% t 1378 6.92% 91.83%
tlh 1961 1.34% tlh 94 0.47% 95.71%
u 6944 4.74% u 966 4.85% 2.33%
v 5592 3.82% v 1189 5.97% 44.02%
w 2660 1.81% w 309 1.55% 15.67%
y 2408 1.64% y 496 2.49% 40.99%
Same table sorted by percent difference:
' 15978 10.90% ' 411 2.06% 136.34%
r 2288 1.56% r 1426 7.16% 128.39%
n 3472 2.37% n 2044 10.26% 124.98%
j 7541 5.15% j 253 1.27% 120.81%
H 8558 5.84% H 294 1.48% 119.29%
D 5014 3.42% D 201 1.01% 108.89%
I 8654 5.90% I 351 1.76% 108.06%
tlh 1961 1.34% tlh 94 0.47% 95.71%
q 4089 2.79% q 203 1.02% 92.98%
gh 4915 3.35% gh 245 1.23% 92.66%
t 3758 2.56% t 1378 6.92% 91.83%
S 4260 2.91% S 218 1.09% 90.59%
e 9357 6.38% e 2631 13.21% 69.66%
ch 3557 2.43% ch 237 1.19% 68.41%
Q 2142 1.46% Q 158 0.79% 59.28%
p 2951 2.01% p 733 3.68% 58.54%
l 4183 2.85% l 976 4.90% 52.77%
v 5592 3.82% v 1189 5.97% 44.02%
y 2408 1.64% y 496 2.49% 40.99%
ng 1153 0.79% ng 232 1.16% 38.74%
o 9250 6.31% o 1643 8.25% 26.61%
a 16585 11.32% a 1865 9.36% 18.89%
b 3985 2.72% b 638 3.20% 16.34%
w 2660 1.81% w 309 1.55% 15.67%
u 6944 4.74% u 966 4.85% 2.33%
m 5306 3.62% m 728 3.65% 0.95%
____
|INRI|
____| |____
| |
|____ ____|
| | Matt Whiteacre
| | [email protected]
| |
| |
|____|