tlhIngan-Hol Archive: Wed Aug 03 21:08:19 1994

Back to archive top level

To this year's listing



[Date Prev][Date Next][Thread Prev][Thread Next]

Re: Scrabble Letter Frequencies



>
>Do we really want letter frequencies in written text, or just
>in a word list? In scrabble, you aren't writing sentences. You
>are just writing words. If you take letter frequencies in any
>single person's text, you are getting the frequencies of words
>that person happens to think of most often. If you do it to the
>whole dictionary, then you get the frequencies of letters in
>the common vocabulary.
>
>charghwI'
>
>

I'm not sure what frequencies you would want.  If you use just the word 
list, and omit the affixes then I think you would get a distorted 
distribution.  Especailly since the suffix are uch more common than the 
words.  As to the problem that a distribution based on one persons writing 
will skew the entire distribution based on that persons personal bent, I use 
writtings by at least 5 different people, and my distribution is very 
simmiliar to Nick's (although I'll have to admit the probably 75% of the 
text on the FTP server is his:)).  Comparing the distributions I got and 
those of Mark Reed, with a column on the far right showing percent 
difference, you can see that there are some substantial differences (chart 
at the end on the message).

Either is a workable distribution, but they are very different.  How does 
Scrabble get thier distribution?  Is it based on text or on word lists?  I 
don't know, but I would favor using text, simply because I feel it better 
represents the frequency or words WITH affixes attached.

Now for the table:

Text based dist.        Word list based dist.
'	15978	10.90%	'	411	2.06%	136.34%
a	16585	11.32%	a	1865	9.36%	18.89%
b	3985	2.72%	b	638	3.20%	16.34%
ch	3557	2.43%	ch	237	1.19%	68.41%
D	5014	3.42%	D	201	1.01%	108.89%
e	9357	6.38%	e	2631	13.21%	69.66%
gh	4915	3.35%	gh	245	1.23%	92.66%
H	8558	5.84%	H	294	1.48%	119.29%
I	8654	5.90%	I	351	1.76%	108.06%
j	7541	5.15%	j	253	1.27%	120.81%
l	4183	2.85%	l	976	4.90%	52.77%
m	5306	3.62%	m	728	3.65%	0.95%
n	3472	2.37%	n	2044	10.26%	124.98%
ng	1153	0.79%	ng	232	1.16%	38.74%
o	9250	6.31%	o	1643	8.25%	26.61%
p	2951	2.01%	p	733	3.68%	58.54%
q	4089	2.79%	q	203	1.02%	92.98%
Q	2142	1.46%	Q	158	0.79%	59.28%
r	2288	1.56%	r	1426	7.16%	128.39%
S	4260	2.91%	S	218	1.09%	90.59%
t	3758	2.56%	t	1378	6.92%	91.83%
tlh	1961	1.34%	tlh	94	0.47%	95.71%
u	6944	4.74%	u	966	4.85%	2.33%
v	5592	3.82%	v	1189	5.97%	44.02%
w	2660	1.81%	w	309	1.55%	15.67%
y	2408	1.64%	y	496	2.49%	40.99%

Same table sorted by percent difference:
'	15978	10.90%	'	411	2.06%	136.34%
r	2288	1.56%	r	1426	7.16%	128.39%
n	3472	2.37%	n	2044	10.26%	124.98%
j	7541	5.15%	j	253	1.27%	120.81%
H	8558	5.84%	H	294	1.48%	119.29%
D	5014	3.42%	D	201	1.01%	108.89%
I	8654	5.90%	I	351	1.76%	108.06%
tlh	1961	1.34%	tlh	94	0.47%	95.71%
q	4089	2.79%	q	203	1.02%	92.98%
gh	4915	3.35%	gh	245	1.23%	92.66%
t	3758	2.56%	t	1378	6.92%	91.83%
S	4260	2.91%	S	218	1.09%	90.59%
e	9357	6.38%	e	2631	13.21%	69.66%
ch	3557	2.43%	ch	237	1.19%	68.41%
Q	2142	1.46%	Q	158	0.79%	59.28%
p	2951	2.01%	p	733	3.68%	58.54%
l	4183	2.85%	l	976	4.90%	52.77%
v	5592	3.82%	v	1189	5.97%	44.02%
y	2408	1.64%	y	496	2.49%	40.99%
ng	1153	0.79%	ng	232	1.16%	38.74%
o	9250	6.31%	o	1643	8.25%	26.61%
a	16585	11.32%	a	1865	9.36%	18.89%
b	3985	2.72%	b	638	3.20%	16.34%
w	2660	1.81%	w	309	1.55%	15.67%
u	6944	4.74%	u	966	4.85%	2.33%
m	5306	3.62%	m	728	3.65%	0.95%
              ____
             |INRI|
         ____|    |____
        |              |
        |____      ____|
             |    |           Matt Whiteacre
             |    |           [email protected]
             |    |
             |    |
             |____|



Back to archive top level