Publication Date
| In 2026 | 0 |
| Since 2025 | 200 |
| Since 2022 (last 5 years) | 1070 |
| Since 2017 (last 10 years) | 2580 |
| Since 2007 (last 20 years) | 4941 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Hendrickson, Amy; Huff, Kristen; Luecht, Ric – College Board, 2009
[Slides] presented at the Annual Meeting of National Council on Measurement in Education (NCME) in San Diego, CA in April 2009. This presentation describes how the vehicles for gathering student evidence--task models and test specifications--are developed.
Descriptors: Test Items, Test Construction, Evidence, Achievement
Suh, Youngsuk; Mroch, Andrew A.; Kane, Michael T.; Ripkey, Douglas R. – Measurement: Interdisciplinary Research and Perspectives, 2009
In this study, a data base containing the responses of 40,000 candidates to 90 multiple-choice questions was used to mimic data sets for 50-item tests under the "nonequivalent groups with anchor test" (NEAT) design. Using these smaller data sets, we evaluated the performance of five linear equating methods for the NEAT design with five levels of…
Descriptors: Test Items, Equated Scores, Methods, Differences
Wells, Craig S.; Baldwin, Su; Hambleton, Ronald K.; Sireci, Stephen G.; Karatonis, Ana; Jirka, Stephen – Applied Measurement in Education, 2009
Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study…
Descriptors: National Competency Tests, Test Bias, Equated Scores, Grade 8
Bobbio, Tatiana; Gabbard, Carl; Cacola, Priscila – Early Childhood Research & Practice, 2009
Motor development attains landmark significance during early childhood. Although early childhood educators may be familiar with the gross-motor skill category, the subcategory of interlimb coordination needs greater attention than it typically receives from teachers of young children. Interlimb coordination primarily involves movements requiring…
Descriptors: Test Items, Young Children, Psychomotor Skills, Motor Development
Wang, Wen-Chung; Shih, Ching-Lin; Yang, Chih-Chien – Educational and Psychological Measurement, 2009
This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling…
Descriptors: Test Items, Measures (Individuals), Test Bias, Evaluation Research
Suto, W. M. Irenka; Nadas, Rita – Research Papers in Education, 2009
It has long been established that marking accuracy in public examinations varies considerably among subjects and markers. This is unsurprising, given the diverse cognitive strategies that the marking process can entail, but what makes some questions harder to mark accurately than others? Are there distinct but subtle features of questions and…
Descriptors: National Curriculum, Physics, Interviews, Examiners
Petry, Katja; Maes, Bea; Vlaskamp, Carla – Research in Developmental Disabilities: A Multidisciplinary Journal, 2009
Because of a shortage of valid instruments to measure the QOL of people with profound multiple disabilities (PMD), the QOL-PMD was developed. In the present study, possibilities for item reduction as well as the psychometric properties of the questionnaire were examined. One hundred and forty-seven informants of people with PMD participated in the…
Descriptors: Multiple Disabilities, Quality of Life, Construct Validity, Questionnaires
van der Linden, Wim J. – Applied Psychological Measurement, 2009
An adaptive testing method is presented that controls the speededness of a test using predictions of the test takers' response times on the candidate items in the pool. Two different types of predictions are investigated: posterior predictions given the actual response times on the items already administered and posterior predictions that use the…
Descriptors: Simulation, Adaptive Testing, Vocational Aptitude, Bayesian Statistics
Klein Entink, R. H.; Fox, J. P.; van der Linden, W. J. – Psychometrika, 2009
Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel…
Descriptors: Test Items, Markov Processes, Item Response Theory, Measurement Techniques
Miyazaki, Kei; Hoshino, Takahiro; Mayekawa, Shin-ichi; Shigemasu, Kazuo – Psychometrika, 2009
This study proposes a new item parameter linking method for the common-item nonequivalent groups design in item response theory (IRT). Previous studies assumed that examinees are randomly assigned to either test form. However, examinees can frequently select their own test forms and tests often differ according to examinees' abilities. In such…
Descriptors: Test Format, Item Response Theory, Test Items, Test Bias
von Davier, Matthias – Measurement: Interdisciplinary Research and Perspectives, 2009
In this commentary, the author points out few issues, one being that there are models mislabeled as diagnostic, which deal with linear decompositions of item difficulties rather than estimating multidimensional skill variables. The author discusses the issue that there are many new names for essentially well-known models for multiple simultaneous…
Descriptors: Test Items, Probability, Models, Diagnostic Tests
Long, Caroline; Dunne, Tim; Craig, Tracy S. – African Journal of Research in Mathematics, Science and Technology Education, 2010
In the transition years, Grades 7 to 9, the shift from natural numbers to rational numbers and the associated multiplicative concepts prove challenging for many learners. The new concepts, operations and notation must be mastered if the student is to thereafter rise to meet the challenges of algebra and more advanced and powerful mathematics. The…
Descriptors: Multiplication, Mathematics Skills, Item Response Theory, Mathematical Concepts
Lyon, Lucy Kay – ProQuest LLC, 2010
Extensive research has been conducted on improving student academic achievement and techniques to improve student learning. There has been little research that addresses the relationship between student achievement and teacher performance. The purpose of this study was to determine the relationship between performance-based teacher evaluation…
Descriptors: Teacher Evaluation, Evaluation Methods, Mathematics Tests, Program Effectiveness
Reshetar, Rosemary; Melican, Gerald J. – College Board, 2010
This paper discusses issues related to the design and psychometric work for mixed-format tests --tests containing both multiple-choice (MC) and constructed-response (CR) items. The issues of validity, fairness, reliability and score consistency can be addressed but for mixed-format tests there are many decisions to be made and no examination or…
Descriptors: Psychometrics, Test Construction, Multiple Choice Tests, Test Items
Currie, Michael; Chiramanee, Thanyapa – Language Testing, 2010
Noting the widespread use of multiple-choice items in tests in English language education in Thailand, this study compared their effect against that of constructed-response items. One hundred and fifty-two university undergraduates took a test of English structure first in constructed-response format, and later in three, stem-equivalent…
Descriptors: Experimental Groups, Multiple Choice Tests, Foreign Countries, Language Tests

Peer reviewed
Direct link
