tlhIngan-Hol Archive: Sun Nov 22 09:01:47 2009
[Date Prev][Date Next][Thread Prev][Thread Next]
Re: The topic marker -'e'
- From: Tracy Canfield <[email protected]>
- Subject: Re: The topic marker -'e'
- Date: Sun, 22 Nov 2009 12:00:04 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=yCh0tw05Zc+/6BO2gQQZYy+UjEYWBabCwWfqEZpNRsk=; b=jQe1GZwmif53EIIXRmJEuX4pPJrhAYvYa3Lb+LStBqJf0jGg9au/9C52IrA8wXQHmw r6sodBW45tT97Nkv5ZKA5LfETWxFHQ2FWqlK1lV9yAmxNcNO5JUrqfF/KQAoGkmomOHF vjiauGqUpPhHba83FkGwY4qMlzsyGE2AraLGY=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=GUta1LXlUWjvSek7sV3USVojnW0Ep//U2PSO2BLfAlKk0clPsjLqg1latZZYL/YhZg qLJv2ZtSc0DZLHQo4KvVNvzgBuTW++/CLKUDMt8DWlFLk6LGrDJ7zm1WqaNyl9zIPh2L uvIF1UQ5n6EMfqlRJ+IzcSTq7b7odVV+1hgME=
- In-reply-to: <[email protected]>
- References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>
2009/11/22 Steven Lytle <[email protected]>:
> It seems that your (or any) MT program should at least attempt to translate
> even ungrammatical utterances.
I actually do take a pass at them after marking them as ungrammatical.
It's still important to distinguish the two - first, because you can
be much more confident about the intended overall meaning of the
grammatical ones, and second, because the grammatical ones are a lot
less unambiguous - you don't have to consider the possibility that a
noun ending in -vaD or -Daq could be the subject.
On the current build, if you take a sentence like
mapum Sor
which I think we can all agree is awful, you get
* fall tree
The * marks it as ungrammatical, but the program makes a try at the
individual words without trying to establish any relationship between
them.
In contrast
ngemDaq pum Sor
returns
The tree falls in the forest
with re-ordering, insertion of appropriate articles and prepositions,
etc. (Plus a gentle reminder on a different line that there are other
legitimate parses because "ngem" and "Sor" could be plural.)
While it might well be worth doing more re-ordering of the
ungrammatical sentences, it's a lower priority than trying to ensure
that if a sentence *is* grammatical, the program can handle it.