tlhIngan-Hol Archive: Tue Apr 01 09:06:09 2014

Back to archive top level

To this year's listing



[Date Prev][Date Next][Thread Prev][Thread Next]

Re: [Tlhingan-hol] Certification Test Woes

d'Armond Speers, Ph.D. ([email protected])



<div dir="ltr"><br><div class="gmail_extra"><div class="gmail_quote">On Tue, Apr 1, 2014 at 1:26 AM, Lieven <span dir="ltr">&lt;<a href="mailto:[email protected]"; target="_blank">[email protected]</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Am 01.04.2014 04:39, schrieb d&#39;Armond Speers, Ph.D.:<div class=""><br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
questions on any given test by weight/content wouldn&#39;t interfere with<br>
the randomization, making certain questions more or less likely to<br>
appear on a test.  I&#39;m open to suggestions on whether this is an issue<br>
</blockquote>
<br></div>
You have prbably thought about it already, but can&#39;t the weight of a quetion be attached to the number of words or syllables in the answer?<br>
<br>
e.g<br>
translate &quot;shoe&quot; - answer = 1 point<br>
translate &quot;where is the bathroom&quot; - answer = 6 points<br>
<br>
This may not always work, but can make some more difference.</blockquote><div><br></div><div>Well, if we were to go with an approach like this I wouldn&#39;t count syllables, but morphemes.  No reason that &quot;bathroom&quot; should have a higher value than &quot;loo&quot;.  The questions are (a) how do you reliably calculate this value; and (b) what do you do with it?</div>
<div><br></div><div>For (a) how do you reliably calculate the content value, the problem is that we&#39;re talking about the content value of the expected answer, not the question itself.  I didn&#39;t want to just have every question be a &quot;translate this sentence&quot; type of question; we also have &quot;fill in the blank&quot; and other types of direct questions (&quot;what is the subject and object indicated by this verb prefix?&quot;), which are typically low-content-value answers.  For the translation type questions, the student is free to translate however they like, even though the possibilities are still pretty few in the Level 1 test.  Just because I think they may use 4 morphemes in their answer doesn&#39;t preclude them using 8 (with twice as many opportunities for errors).  And each question isn&#39;t just testing a single grammar point; some are testing multiple topics at once.  We do define the expected answer (as a benefit to the one grading the test, not as a hard-and-fast right/wrong test), so we could just use that as an estimate and call it good enough.  Or should we also take into account the number of topics associated with each question?  See, this is complicated.</div>
<div><br></div><div>For (b) what do you do with it, there are two general ways to approach this.  (Well, three, if you count the way we did it, which is to allow randomization to take care of it.)  You can make all expected answers have the same amount of content (very hard, and doesn&#39;t permit direct questions), or you can measure the content of expected answers and establish some heuristic for how many questions with each content value to include on a test and ensuring that each test has the same content value total across 20 questions.  You would probably accomplish this by ensuring that each topic listed in the guidelines for that level had the same number of questions for each content value, which would mean greatly expanding the size of the test bank.  Defining that heuristic is an empirical question on test design (I remember doing this stuff in college and it was tedious!), so not my preference to undertake the task.</div>
<div><br></div><div>My preference was/is not to make all of the questions uniform.  I could easily come up with a test bank of 500 questions just asking to translate each vocabulary term, but that&#39;s (a) boring and (b) not really measuring their practical skill with the language.  Some questions are directly about grammar, some are translation (simultaneously evaluating their grasp of grammar and vocabulary), and some are on some specific topic, like the distinction between {ghobe&#39;} and {Qo&#39;}.  Honestly, when considering the question of how to create questions that were meaningful, interesting, and sufficiently varied to account for the range of language use, while maintaining balance across the topics that we identified in the guidelines, I didn&#39;t think it was practical to take into account the content value of the expected answer as well, not without having to increase the size of the test bank considerably.  It was already a huge effort, so I didn&#39;t think that was practical.<br>
</div><div><br></div><div>Ah, memories.  :)  veqlargh is always in the details.  Am I over-thinking this?  Is there a simpler way?  And if not, is the original problem serious enough to warrant this level of test re-design?</div>
<div><br></div><div>--Holtej</div></div></div></div>
_______________________________________________
Tlhingan-hol mailing list
[email protected]
http://mail.kli.org/mailman/listinfo/tlhingan-hol


Back to archive top level