Multiple choice explained

Multiple choice (MC),^[1] objective response or MCQ(for multiple choice question) is a form of an objective assessment in which respondents are asked to select only the correct answer from the choices offered as a list. The multiple choice format is most frequently used in educational testing, in market research, and in elections, when a person chooses between multiple candidates, parties, or policies.

Although E. L. Thorndike developed an early scientific approach to testing students, it was his assistant Benjamin D. Wood who developed the multiple-choice test.^[2] Multiple-choice testing increased in popularity in the mid-20th century when scanners and data-processing machines were developed to check the result. Christopher P. Sole created the first multiple-choice examinations for computers on a Sharp Mz 80 computer in 1982. It was developed to aid people with dyslexia cope with agricultural subjects, as Latin plant names can be difficult to understand and write.

Nomenclature

Single Best Answer (SBA or One Best Answer) is a written examination form of MCQ used extensively in medical education. This form, from which the candidate must choose the best answer, has been distinguished from Single Correct Answer forms, which can produce confusion where more than one of the possible answers has some validity. The SBA form makes it explicit that more than one answer may have elements that are correct, but that one answer will be superior.

Structure

Multiple choice items consist of a stem and several alternative answers. The stem is the opening—a problem to be solved, a question asked, or an incomplete statement to be completed. The options are the possible answers that the examinee can choose from, with the correct answer called the key and the incorrect answers called distractors.^[3] Only one answer may be keyed as correct. This contrasts with multiple response items in which more than one answer may be keyed as correct.

Usually, a correct answer earns a set number of points toward the total mark, and an incorrect answer earns nothing. However, tests may also award partial credit for unanswered questions or penalize students for incorrect answers, to discourage guessing. For example, the SAT Subject tests remove a quarter point from the test taker's score for an incorrect answer.

For advanced items, such as an applied knowledge item, the stem can consist of multiple parts. The stem can include extended or ancillary material such as a vignette, a case study, a graph, a table, or a detailed description which has multiple elements to it. Anything may be included as long as it is necessary to ensure the utmost validity and authenticity to the item. The stem ends with a lead-in question explaining how the respondent must answer. In a medical multiple choice items, a lead-in question may ask "What is the most likely diagnosis?" or "What pathogen is the most likely cause?" in reference to a case study that was previously presented.

The items of a multiple choice test are often colloquially referred to as "questions," but this is a misnomer because many items are not phrased as questions. For example, they can be presented as incomplete statements, analogies, or mathematical equations. Thus, the more general term "item" is a more appropriate label. Items are stored in an item bank.

Examples

Ideally, the multiple choice question (MCQ) should be asked as a "stem", with plausible options, for example:

(The correct answers are B, C and A respectively.)

A well written multiple-choice question avoids obviously wrong or implausible distractors (such as the non-Indian city of Detroit being included in the third example), so that the question makes sense when read with each of the distractors as well as with the correct answer.

A more difficult and well-written multiple choice question is as follows:

Advantages

There are several advantages to multiple choice tests. If item writers are well trained and items are quality assured, it can be a very effective assessment technique.^[4] If students are instructed on the way in which the item format works and myths surrounding the tests are corrected, they will perform better on the test.^[5] On many assessments, reliability has been shown to improve with larger numbers of items on a test, and with good sampling and care over case specificity, overall test reliability can be further increased.^[6]

Multiple choice tests often require less time to administer for a given amount of material than would tests requiring written responses.

Multiple choice questions lend themselves to the development of objective assessment items, but without author training, questions can be subjective in nature. Because this style of test does not require a teacher to interpret answers, test-takers are graded purely on their selections, creating a lower likelihood of teacher bias in the results.^[7] Factors irrelevant to the assessed material (such as handwriting and clarity of presentation) do not come into play in a multiple-choice assessment, and so the candidate is graded purely on their knowledge of the topic. Finally, if test-takers are aware of how to use answer sheets or online examination tick boxes, their responses can be relied upon with clarity. Overall, multiple choice tests are the strongest predictors of overall student performance compared with other forms of evaluations, such as in-class participation, case exams, written assignments, and simulation games.^[8]

Prior to the widespread introduction of SBAs into medical education, the typical form of examination was true-false questions. But during the 2000s, educators found that SBAs would be superior.^[9]

Disadvantages

The most serious disadvantage is the limited types of knowledge that can be assessed by multiple choice tests. Multiple choice tests are best adapted for testing well-defined or lower-order skills. Problem-solving and higher-order reasoning skills are better assessed through short-answer and essay tests. However, multiple choice tests are often chosen, not because of the type of knowledge being assessed, but because they are more affordable for testing a large number of students. This is especially true in the United States and India, where multiple choice tests are the preferred form of high-stakes testing and the sample size of test-takers is large respectively.

Another disadvantage of multiple choice tests is possible ambiguity in the examinee's interpretation of the item. Failing to interpret information as the test maker intended can result in an "incorrect" response, even if the taker's response is potentially valid. The term "multiple guess" has been used to describe this scenario because test-takers may attempt to guess rather than determine the correct answer. A free response test allows the test taker to make an argument for their viewpoint and potentially receive credit.

In addition, even if students have some knowledge of a question, they receive no credit for knowing that information if they select the wrong answer and the item is scored dichotomously. However, free response questions may allow an examinee to demonstrate partial understanding of the subject and receive partial credit. Additionally if more questions on a particular subject area or topic are asked to create a larger sample then statistically their level of knowledge for that topic will be reflected more accurately in the number of correct answers and final results.

Another disadvantage of multiple choice examinations is that a student who is incapable of answering a particular question can simply select a random answer and still have a chance of receiving a mark for it. If randomly guessing an answer, there is usually a 25 percent chance of getting it correct on a four-answer choice question. It is common practice for students with no time left to give all remaining questions random answers in the hope that they will get at least some of them right. Many exams, such as the Australian Mathematics Competition and the SAT, have systems in place to negate this, in this case by making it no more beneficial to choose a random answer than to give none.

Another system of negating the effects of random selection is formula scoring, in which a score is proportionally reduced based on the number of incorrect responses and the number of possible choices. In this method, the score is reduced by the number of wrong answers divided by the average number of possible answers for all questions in the test, w/(c – 1) where w is the number of wrong responses on the test and c is the average number of possible choices for all questions on the test.^[10] All exams scored with the three-parameter model of item response theory also account for guessing. This is usually not a great issue, moreover, since the odds of a student receiving significant marks by guessing are very low when four or more selections are available.

Additionally, it is important to note that questions phrased ambiguously may confuse test-takers. It is generally accepted that multiple choice questions allow for only one answer, where the one answer may encapsulate a collection of previous options. However, some test creators are unaware of this and might expect the student to select multiple answers without being given explicit permission, or providing the trailing encapsulation options.

Critics like philosopher and education proponent Jacques Derrida, said that while the demand for dispensing and checking basic knowledge is valid, there are other means to respond to this need than resorting to crib sheets.^[11]

Despite all the shortcomings, the format remains popular because MCQs are easy to create, score and analyse.^[12]

Changing answers

The theory that students should trust their first instinct and stay with their initial answer on a multiple choice test is a myth worth dispelling. Researchers have found that although some people believe that changing answers is bad, it generally results in a higher test score. The data across twenty separate studies indicate that the percentage of "right to wrong" changes is 20.2%, whereas the percentage of "wrong to right" changes is 57.8%, nearly triple.^[13] Changing from "right to wrong" may be more painful and memorable (Von Restorff effect), but it is probably a good idea to change an answer after additional reflection indicates that a better choice could be made. In fact, a person's initial attraction to a particular answer choice could well derive from the surface plausibility that the test writer has intentionally built into a distractor (or incorrect answer choice). Test item writers are instructed to make their distractors plausible yet clearly incorrect. A test taker's first-instinct attraction to a distractor is thus often a reaction that probably should be revised in light of a careful consideration of each of the answer choices. Some test takers for some examination subjects might have accurate first instincts about a particular test item, but that does not mean that all test takers should trust their first instinct.

Notable multiple-choice examinations

ACT
AIEEE in India
AP
ASVAB
AMC
Australian Mathematics Competition
CFA
CISSP
- CLEP COMLEX
CLAT
Hong Kong Diploma of Secondary Education
F = ma, leading up to the United States Physics Olympiad
FE
GCE Ordinary Level
GED
GRE
GATE
IB Diploma Programme science subject exams
IIT-JEE
Indonesian National Exam
LSAT
MCAT
Multistate Bar Examination
NCLEX
PLAB for non-EEA medical graduates to practice in the UK
PSAT
SAT
Test of English as a Foreign Language
TOEIC
USMLE
NTSE
NEET(UG) in India
UGC NET in India
UPSC CSE Preliminary in India
UTME University Admission Exam in Nigeria

Notes and References

10.1111/j.1467-8535.2010.01058.x . Constructive multiple-choice testing system . 2010 . Park . Jooyong . British Journal of Educational Technology . 41 . 6 . 1054–1064 .
Alumni Notes . The Alcalde . May 1973 . 61 . 5 . 36 . 29 November 2020 . 1535-993X.
Kehoe . Jerard . 1995 . Writing multiple-choice test items . Practical Assessment, Research & Evaluation . 4 . 9.
http://www.nbme.org/publications/item-writing-manual-download.html Item Writing Manual
10.1046/j.1365-2923.2003.01499.x . A needs-based study and examination skills course improves students' performance . 2003 . Beckert . Lutz . Wilkinson . Tim J. . Sainsbury . Richard . Medical Education . 37 . 5 . 424–428 . 12709183 . 11096249 .
10.1111/j.1365-2929.2004.01932.x . Reliability: On the reproducibility of assessment data . 2004 . Downing . Steven M. . Medical Education . 38 . 9 . 1006–1012 . 15327684 . 1150035 .
News: DePalma. Anthony. Revisions Adopted in College Entrance Tests. 22 August 2012. New York Times. 1 November 1990.
Bontis . N. . Hardie . T. . Serenko . A. . 2009 . Techniques for assessing skills and knowledge in a business strategy classroom . International Journal of Teaching and Case Studies . 2 . 2 . 162–180. 10.1504/IJTCS.2009.031060 .
18585017 . 10.1016/j.clon.2008.05.010 . 20 . The introduction of single best answer questions as a test of knowledge in the final examination for the fellowship of the Royal College of Radiologists in Clinical Oncology . 2008 . Clin Oncol (R Coll Radiol) . 571–6 . Tan . LT . McAleer . JJ. 8 .
Web site: Formula Scoring of Multiple-Choice Tests (Correction for Guessing) . 2011-05-20 . dead . https://web.archive.org/web/20110721041317/http://www.ncme.org/pubs/items/ITEMS_Mod_4.pdf . 2011-07-21 .
[Jacques Derrida]
Web site: Multiple-Choice Tests: Revisiting the Pros and Cons. 2018-02-21. Faculty Focus Higher Ed Teaching & Learning. en-US. 2019-03-22.
10.1177/009862838401100303 . Staying with Initial Answers on Objective Tests: Is it a Myth? . 1984 . Benjamin . Ludy T. . Cavell . Timothy A. . Shallenberger . William R. . Teaching of Psychology . 11 . 3 . 133–141 . 33889890 .