tlhIngan-Hol Archive: Thu Oct 03 14:56:45 1996

Back to archive top level

To this year's listing



[Date Prev][Date Next][Thread Prev][Thread Next]

Re: An interesting Scrabble idea



[Re letter frequency]

Qob, I doubt that the letter distribution which we used at the qep'a' was
very good.  Actually, I had mentioned that the subject had come up before on
the mailing list.  I had saved two posts which counted letter distributions,
and they agree on most numbers.  I'll repost them here.  (Jeremy, note that
your Scrabble letters are NOT in these distributions!)

--------------------
from Matt Whiteacre:
--------------------
I have gone back to the FTP site and collected all the works in the 
AESOP, KBTP, and KSRP directories, and run an letter count on them.  
I have duplicated the results below.  The left side is sorted vaguely 
by alpha, while the right is by frequency.  

a	27487		a	27487
b	6152		'	25942
ch	5661		o	15185
D	8125		e	15129
e	15129		H	14497
ng	1776		I	13579
gh	7678		j	12043
H	14497		u	11487
I	13579		m	8986
j	12043		v	8826
q	6770		D	8125
l	7389		gh	7678
m	8986		l	7389
n	5424		S	7115
o	15185		q	6770
p	5045		t	6258
Q	3439		b	6152
r	3752		ch	5661
S	7115		n	5424
t	6258		p	5045
u	11487		w	4357
v	8826		y	4202
w	4357		r	3752
tlh	3268		Q	3439
y	4202		tlh	3268
'	25942		ng	1776

There were 239572 characters considered, out of a total file size of 
360K.  This includes Hamlet and Much Ado About Nothing.  If you 
compare this distribution to the one I previously used, you find that 
5 pairs of letters switched: b/t, l/S, v/m, H/I, and e/o.  The 
English frequency was based on the boggle dice themselves, which 
matches the distribution in english well.  Thus my proposal for a 
conversion still stands.

P.S. for those interested there were 330 occurences of the 
combination "rgh" which are included in the table above under both 
"r" and "gh".
--------------------

--------------------
>From Daniel Noll (voqHa'wI'):
--------------------
I just searched through Hamlet myself, and got this distribution from that.
I scaled the most frequent, {a}, to 100 to allow a fair comparison.
I would not at all be suprised if they check out with the other
distributions.

a 100, ' 97, e 57, o 57, H 55, I 51, j 45, u 43, m 37, v 31, D 30, l 29,
q 26, gh 26, S 25, t 23, b 23, ch 20, n 20, p 19, w 17, y 16, r 15, Q 14,
tlh 11, ng 7.
--------------------

SuStel
Stardate 96758.1


Back to archive top level