Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 11 |
Descriptor
Evaluation | 12 |
Psychometrics | 5 |
Test Construction | 5 |
Test Items | 5 |
Automation | 3 |
Cognitive Processes | 2 |
Comparative Analysis | 2 |
English (Second Language) | 2 |
Focus Groups | 2 |
Guidelines | 2 |
High Stakes Tests | 2 |
More ▼ |
Source
Educational Measurement:… | 12 |
Author
Arim, Rubab | 1 |
Breyer, F. Jay | 1 |
Chapelle, Carol A. | 1 |
Choe, Edison M. | 1 |
Choi, Jaehwa | 1 |
Cook, Robert | 1 |
Domene, Jose | 1 |
Dorans, Neil J. | 1 |
Enright, Mary K. | 1 |
Ercikan, Kadriye | 1 |
Fu, Yanyan | 1 |
More ▼ |
Publication Type
Journal Articles | 12 |
Reports - Descriptive | 5 |
Reports - Research | 4 |
Reports - Evaluative | 2 |
Opinion Papers | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Elementary Education | 1 |
Grade 3 | 1 |
Audience
Location
Canada | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yanyan Fu – Educational Measurement: Issues and Practice, 2024
The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items.…
Descriptors: Error Correction, Automation, Test Items, Test Construction
Fu, Yanyan; Choe, Edison M.; Lim, Hwanggyu; Choi, Jaehwa – Educational Measurement: Issues and Practice, 2022
This case study applied the "weak theory" of Automatic Item Generation (AIG) to generate isomorphic item instances (i.e., unique but psychometrically equivalent items) for a large-scale assessment. Three representative instances were selected from each item template (i.e., model) and pilot-tested. In addition, a new analytical framework,…
Descriptors: Test Items, Measurement, Psychometrics, Test Construction
Lewis, Daniel; Cook, Robert – Educational Measurement: Issues and Practice, 2020
In this paper we assert that the practice of principled assessment design renders traditional standard-setting methodology redundant at best and contradictory at worst. We describe the rationale for, and methodological details of, Embedded Standard Setting (ESS; previously, Engineered Cut Scores. Lewis, 2016), an approach to establish performance…
Descriptors: Standard Setting, Evaluation, Cutting Scores, Performance Based Assessment
Gierl, Mark J.; Lai, Hollis – Educational Measurement: Issues and Practice, 2016
Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
Descriptors: Test Items, Test Construction, Psychometrics, Models
Zenisky, April L.; Hambleton, Ronald K. – Educational Measurement: Issues and Practice, 2012
Test scores matter these days. Test-takers want to understand how they performed, and test score reports, particularly those for individual examinees, are the vehicles by which most people get the bulk of this information. Historically, score reports have not always met the examinees' information or usability needs, but this is clearly changing…
Descriptors: Scores, Psychometrics, Test Results, Usability
Sinharay, Sandip; Dorans, Neil J.; Liang, Longjuan – Educational Measurement: Issues and Practice, 2011
Over the past few decades, those who take tests in the United States have exhibited increasing diversity with respect to native language. Standard psychometric procedures for ensuring item and test fairness that have existed for some time were developed when test-taking groups were predominantly native English speakers. A better understanding of…
Descriptors: Test Bias, Testing Programs, Psychometrics, Language Proficiency
Kolen, Michael J.; Lee, Won-Chan – Educational Measurement: Issues and Practice, 2011
This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…
Descriptors: Test Use, Test Format, Error of Measurement, Raw Scores
Williamson, David M.; Xi, Xiaoming; Breyer, F. Jay – Educational Measurement: Issues and Practice, 2012
A framework for evaluation and use of automated scoring of constructed-response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are…
Descriptors: Automation, Scoring, Evaluation, Guidelines
Ercikan, Kadriye; Arim, Rubab; Law, Danielle; Domene, Jose; Gagnon, France; Lacroix, Serge – Educational Measurement: Issues and Practice, 2010
This paper demonstrates and discusses the use of think aloud protocols (TAPs) as an approach for examining and confirming sources of differential item functioning (DIF). The TAPs are used to investigate to what extent surface characteristics of the items that are identified by expert reviews as sources of DIF are supported by empirical evidence…
Descriptors: Test Bias, Protocol Analysis, Cognitive Processes, Expertise
Hein, Serge F.; Skaggs, Gary – Educational Measurement: Issues and Practice, 2010
Increasingly, research has focused on the cognitive processes associated with various standard-setting activities. This qualitative study involved an examination of 16 third-grade reading teachers' experiences with the cognitive task of conceptualizing an entire classroom of hypothetical target students when the single-passage bookmark method or…
Descriptors: Focus Groups, Standard Setting, Interviews, Reading Teachers
Chapelle, Carol A.; Enright, Mary K.; Jamieson, Joan – Educational Measurement: Issues and Practice, 2010
Drawing on experience between 2000 and 2007 in developing a validity argument for the high-stakes Test of English as a "Foreign Language[TM]" (TOEFL[R]), this paper evaluates the differences between the argument-based approach to validity as presented by "Kane (2006)" and that described in the 1999 "AERA/APA/NCME Standards for Educational and…
Descriptors: Psychological Testing, Validity, High Stakes Tests, English (Second Language)

Madaus, George F. – Educational Measurement: Issues and Practice, 1985
Since the 1970s, policymakers have learned that test results can be used as an administrative mechanism to implement policy. A national system of external certification tests would involve several problems: (1) absence of independent examination boards; (2) negative impact on teaching; (3) tendency to elitism; (4) competition for RFP's; and (5)…
Descriptors: Achievement Tests, Certification, Educational Assessment, Educational Change