Publication Date
| In 2026 | 0 |
| Since 2025 | 220 |
| Since 2022 (last 5 years) | 1089 |
| Since 2017 (last 10 years) | 2599 |
| Since 2007 (last 20 years) | 4960 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Liu, Ou Lydia; Lee, Hee-Sun; Linn, Marcia C. – Educational Assessment, 2011
Both multiple-choice and constructed-response items have known advantages and disadvantages in measuring scientific inquiry. In this article we explore the function of explanation multiple-choice (EMC) items and examine how EMC items differ from traditional multiple-choice and constructed-response items in measuring scientific reasoning. A group…
Descriptors: Science Tests, Multiple Choice Tests, Responses, Test Items
Leighton, Jacqueline P.; Heffernan, Colleen; Cor, M. Kenneth; Gokiert, Rebecca J.; Cui, Ying – Applied Measurement in Education, 2011
The "Standards for Educational and Psychological Testing" indicate that test instructions, and by extension item objectives, presented to examinees should be sufficiently clear and detailed to help ensure that they respond as developers intend them to respond (Standard 3.20; AERA, APA, & NCME, 1999). The present study investigates…
Descriptors: Test Construction, Validity, Evidence, Science Tests
Winke, Paula – Language Assessment Quarterly, 2011
In this study, I investigated the reliability of the U.S. Naturalization Test's civics component by asking 414 individuals to take a mock U.S. citizenship test comprising civics test questions. Using an incomplete block design of six forms with 16 nonoverlapping items and four anchor items on each form (the anchors connected the six subsets of…
Descriptors: Test Items, Citizenship, Civics, Test Validity
White, Harold B. – Biochemistry and Molecular Biology Education, 2011
The author and other teaching faculty take pride in their ability to write creative and challenging examination questions. Their self-assessment is based on experience and their knowledge of their subject and discipline. Although their judgment may be correct, it is done usually in the absence of deep knowledge of what is known about the…
Descriptors: Test Items, Community Colleges, Molecular Biology, College Faculty
Finch, Holmes – Applied Psychological Measurement, 2011
Estimation of multidimensional item response theory (MIRT) model parameters can be carried out using the normal ogive with unweighted least squares estimation with the normal-ogive harmonic analysis robust method (NOHARM) software. Previous simulation research has demonstrated that this approach does yield accurate and efficient estimates of item…
Descriptors: Item Response Theory, Computation, Test Items, Simulation
Maria Assunta Hardy – ProQuest LLC, 2011
Guidelines to screen and select common items for vertical scaling have been adopted from equating. Differences between vertical scaling and equating suggest that these guidelines may not apply to vertical scaling in the same way that they apply to equating. For example, in equating the examinee groups are assumed to be randomly equivalent, but in…
Descriptors: Elementary School Mathematics, Mathematics Tests, Test Construction, Test Items
Cihangir-Cankaya, Zeynep – Educational Sciences: Theory and Practice, 2012
There are two main objectives of this study: The first is to reconsider the Listening Skill Scale and the second is to compare the levels of students of counseling and guidance according to the situations of whether they took the courses including the listening skills and to gender variable. In accordance with these objectives, the data obtained…
Descriptors: Measures (Individuals), Psychology, Guidance, Listening Skills
Alsubait, Tahani; Parsia, Bijan; Sattler, Uli – Research in Learning Technology, 2012
Different computational models for generating analogies of the form "A is to B as C is to D" have been proposed over the past 35 years. However, analogy generation is a challenging problem that requires further research. In this article, we present a new approach for generating analogies in Multiple Choice Question (MCQ) format that can be used…
Descriptors: Computer Assisted Testing, Programming, Computer Software, Computer Software Evaluation
Shin, Chingwei David; Chien, Yuehmei; Way, Walter Denny – Pearson, 2012
Content balancing is one of the most important components in the computerized adaptive testing (CAT) especially in the K to 12 large scale tests that complex constraint structure is required to cover a broad spectrum of content. The purpose of this study is to compare the weighted penalty model (WPM) and the weighted deviation method (WDM) under…
Descriptors: Computer Assisted Testing, Elementary Secondary Education, Test Content, Models
College Board, 2012
Looking beyond the right or wrong answer is imperative to the development of effective educational environments conducive to Pre-AP work in math. This presentation explores a system of evaluation in math that provides a personalized, student-reflective model correlated to consortia-based assessment. Using examples of students' work that includes…
Descriptors: Student Evaluation, Mathematics Instruction, Correlation, Educational Assessment
Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J. – Journal of Visual Impairment & Blindness, 2012
Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
Descriptors: Test Bias, Test Items, Alternative Assessment, Visual Impairments
Wu, Johnny; King, Kevin M.; Witkiewitz, Katie; Racz, Sarah Jensen; McMahon, Robert J. – Psychological Assessment, 2012
Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item…
Descriptors: Item Analysis, Test Bias, Test Items, Item Response Theory
Gierl, Mark J.; Lai, Hollis – International Journal of Testing, 2012
Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Descriptors: Foreign Countries, Psychometrics, Test Construction, Test Items
Ruiz-Primo, Maria Araceli; Li, Min; Wills, Kellie; Giamellaro, Michael; Lan, Ming-Chih; Mason, Hillary; Sands, Deanna – Journal of Research in Science Teaching, 2012
The purpose of this article is to address a major gap in the instructional sensitivity literature on how to develop instructionally sensitive assessments. We propose an approach to developing and evaluating instructionally sensitive assessments in science and test this approach with one elementary life-science module. The assessment we developed…
Descriptors: Effect Size, Inferences, Student Centered Curriculum, Test Construction
Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012
We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…
Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Peer reviewed
Direct link
