tlhIngan-Hol Archive: Mon Sep 08 18:44:10 1997

Back to archive top level

To this year's listing



[Date Prev][Date Next][Thread Prev][Thread Next]

RE: mu'ghomwIj



charghwI',

I have taken this off list because I doubt anyone else has any interest   
at this point.

 -----Original Message-----
From: William H. Martin [SMTP:[email protected]]
Sent: Sunday, September 07, 1997 9:18 PM
To: ANDEENRE; Multiple recipients of list
Subject: Re: mu'ghomwIj


> On Thu, 28 Aug 1997 18:34:31 -0700 (PDT)  "Andeen, Eric"
> <[email protected]> wrote:

>> I am currently in the process of creating a personal dictionary, and   
    

>> would appreciate some (constructive) feedback on the design. The data   
    

>> will be stored in an Access database, and the interface will be a VB   
    

>> application. My primary focus now is the Klingon word list; if I get   
    

>> ambitious later, I will add an English keyword list and a canon   
database,
>> complete with a parser to add references to the wordlist.
>>
>> My current wordlist table has all the basics:
>>
>> Word
>> Definition
>> Part of speech
>> Source
>> Qualifiers (regional, slang)
>> Notes

> This pretty much reflects my own Access database, though I use
> the Notes field primarily to quote canon use of verbs, primarily
> toward the end of understanding whether they are intransitive or
> if transitive, what sort of noun serves as direct object.

> I also have each word entered in three fields. First is a
> pseudo-sort there I spell {Qu'vatlh} as q2zvat2 so it will sort
> properly. For this, the entries different from the normal
> alphabet are:

> c=ch
> g=gh
> n1=n
> n2=ng
> q1=q
> q2=Q
> t1=t
> t2=tlh
> z='

> Having this in a field and building a querry sorting the
> database by this field makes things appear in the correct order.

> The other two entries use my modified version of Lawrence's
> pIqaD font and my own ligature font which uses the same
> keymapping as Lawrence's font, so when I type "x" it shows as
> "tlh". In other words, the "x" character in this font looks like
> the three letters "tlh".

This is a wonderful idea. I had a sort key field which coded the first
letter of each word, but this turned out to be insufficient. I decided to
do a full sort key a few days ago, and now I don't have to invent the
mapping ;-) The KLI pIqaD mapping is also a good idea. I'll write a bit
of code to generate these from the word later, but for now I'll probably
just hand code them.

> Related tables contain the parts of speech and the sources. I
> started doing a canon table to relate to the rest of it, but
> that project never made it. Instead, I've noted canon useage in
> the notes field, which is a memo field so I don't have to worry
> about field length.
   

>> Both the part of speech and the source are coded into several fields   
so
>> that I can generate such things as jatlh: verb: transitive & speaking,   
or
>> maH: noun: quantitative: number forming element (or chuvmey: pronoun).

> I don't think I understand this. If you are saying you have
> multiple fields for the same thing, like multiple part of speech
> fields, there are more efficient ways to do this. You don't want
> a lot of empty fields per average record. Instead, it is better
> to have multiple entries of the same word, like {neH} with each
> having a different definition and part of speech, much as they
> appear in the paper dictionary. Similarly, you save space by
> pulling out fields with a few very repeated values, like part of
> speech and source, and making them related tables.

I don't mean multiple parts of speech for a single entry - yIn (n) and
yIn (v) are two different words, even if the meanings are similar.

I haven't worked much with databases, and this is where it shows.
I'm still exploring the best way to represent complex information like
part of speech and source.

For part of speech, every word is either a verb, a noun, or chuvmey,
but there is a bit more to it than that: verbs (as you have always
argued) can have specific objects they can take (including none at
all), and some may be used as verbs of speaking, which is a totally
different grammar. Most of this will probably be covered by the notes
for each word (<jatlh>, for example, will have the entire MSN post on
the subject pasted into the notes field), but I would also like to
encode this into a few fairly recognizable categories for my own
consumption. For verbs, I'd like to note transitivity, speaking verbs,
and maybe a few other things I can't think of right now. For nouns,
the list is larger: we have proper nouns, gramatically singular thingees
like <cha> and <ngop>, quantitative nouns like <Hoch> and <'op>,
numbers and number forming elements, etc.

The point is, I can draw out a heirarchy of these things in my head
(which probably does not closely match what the real linguists use, but
works for me), but I have not yet figured out how to represent that very
well in the bizzare world of relational databases. The approach I have
taken is to map this hierarchy into a few integer values and include
them as fields in the wordlist table. I'm aware that this is probably not   

the best way to do this, but I have no idea what the better way is.

I may just give up and use a text field for the whole thing, so <jatlh>
winds up as "vst" for verb: speaking/transitive and <Hoch> winds up as
"nq" for noun: quantitative.

After thinking about this for a while, I believe anyone who tries write
a program to parse a bibliography using the MLA (?) standard formats
(over 20 of them) into a standard relational database is destined to
become insane.

>> I am also considering adding related words. For example, <Qub> could   
have
>> links to <Sov>, <Har>, <Hon>, etc., thus adding something resembling a   
    

>> thesaurus to the dictionary.

> The trick there is again, how do you avoid multiple fields per
> entry, since that leads to a lot of empty fields in an average
> record, and some word that will have too many synonyms to fit in
> your multiple fields.

> The trick is an intermediate related table. Relate a "synonym"
> table to the word list table so that you pair synonyms. One pair
> is one record. A word with multiple synonyms gets multiple
> records. You can build a sorting querry based on the first field
> and build a report on that querry so that it breaks on the first
> field, creating a thesaurus.

The way I'm doing this is indeed through a related words table,
where each entry is a pair of words. Since it is an ordered pair,
the (conceptual) relationship generally only goes one way.
I'm debating whether or not I should make the thesaurus query
more complicated and go both ways or just leave it. Of course,
this all assumes I can get all the relationships and the queries
right in the first place.

> In theory. I have not done it yet. I'm not finished entering the
> damned words yet.
   

>> pagh

> charghwI'

pagh

mI'QeD yIqel: paghvo' tagh Hoch  


Back to archive top level