ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	9
Since 2016 (last 10 years)	17
Since 2006 (last 20 years)	41

Descriptor

Test Construction	120
Test Items	45
Scores	24
Test Validity	20
Item Response Theory	19
Multiple Choice Tests	18
Evaluation Methods	16
Computer Assisted Testing	14
Performance Based Assessment	14
Scoring	14
Test Format	14
Test Use	14
Educational Assessment	13
Psychometrics	13
Validity	13
Elementary Secondary Education	11
Reliability	11
Classification	10
Difficulty Level	10
Evidence	10
Mathematics Tests	10
Models	10
Student Evaluation	10
Test Content	10
Achievement Tests	9
More ▼

Source

Applied Measurement in…

120

Publication Type

Journal Articles	120
Reports - Evaluative	53
Reports - Research	50
Reports - Descriptive	14
Information Analyses	7
Speeches/Meeting Papers	4
Guides - Non-Classroom	1
Historical Materials	1
Opinion Papers	1

Education Level

High Schools	9
Secondary Education	9
Elementary Secondary Education	6
Higher Education	5
Elementary Education	4
Grade 8	3
Junior High Schools	3
Middle Schools	3
Postsecondary Education	3
Grade 3	2
Grade 5	2
Grade 10	1
Grade 11	1
Grade 2	1
Grade 6	1
Grade 9	1
Intermediate Grades	1
More ▼

Audience

Location

Canada	4
Australia	1
Connecticut	1
Europe	1
Israel	1
New York (New York)	1
Turkey	1
United States	1

Laws, Policies, & Programs

Race to the Top

Assessments and Surveys

Texas Assessment of Academic…	4
SAT (College Admission Test)	3
Advanced Placement…	1
California Achievement Tests	1
Graduate Record Examinations	1
Iowa Tests of Basic Skills	1
Iowa Tests of Educational…	1
Metropolitan Achievement Tests	1
Stanford Achievement Tests	1
Wechsler Intelligence Scale…	1
Woodcock Johnson Tests of…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 120 results Save | Export

Item-Writing Guidelines on Response Option Placement: A Systematic Review

Peer reviewed

Direct link

Séverin Lions; María Paz Blanco; Pablo Dartnell; Carlos Monsalve; Gabriel Ortega; Julie Lemarié – Applied Measurement in Education, 2024

Multiple-choice items are universally used in formal education. Since they should assess learning, not test-wiseness or guesswork, they must be constructed following the highest possible standards. Hundreds of item-writing guides have provided guidelines to help test developers adopt appropriate strategies to define the distribution and sequence…

Descriptors: Test Construction, Multiple Choice Tests, Guidelines, Test Items

Using Content Relevance and Representativeness Indices in Instrument Revision

Peer reviewed

Direct link

Anne Traynor; Sara C. Christopherson – Applied Measurement in Education, 2024

Combining methods from earlier content validity and more contemporary content alignment studies may allow a more complete evaluation of the meaning of test scores than if either set of methods is used on its own. This article distinguishes item relevance indices in the content validity literature from test representativeness indices in the…

Descriptors: Test Validity, Test Items, Achievement Tests, Test Construction

Enacting a Process for Developing Culturally Relevant Classroom Assessments

Peer reviewed

Direct link

O'Dwyer, Eowyn P.; Sparks, Jesse R.; Nabors Oláh, Leslie – Applied Measurement in Education, 2023

A critical aspect of the development of culturally relevant classroom assessments is the design of tasks that affirm students' racial and ethnic identities and community cultural practices. This paper describes the process we followed to build a shared understanding of what culturally relevant assessments are, to pursue ways of bringing more…

Descriptors: Evaluation Methods, Culturally Relevant Education, Test Construction, Educational Research

Rethinking Think-Alouds: The Often-Problematic Collection of Response Process Data

Peer reviewed

Direct link

Leighton, Jacqueline P. – Applied Measurement in Education, 2021

The objective of this paper is to comment on the think-aloud methods presented in the three papers included in this special issue. The commentary offered stems from the author's own psychological investigations of unobservable information processes and the conditions under which the most defensible claims can be advanced. The structure of this…

Descriptors: Protocol Analysis, Data Collection, Test Construction, Test Validity

Using Think-Alouds for Response Process Evidence of Teacher Attentiveness

Peer reviewed

Direct link

Mo, Ya; Carney, Michele; Cavey, Laurie; Totorica, Tatia – Applied Measurement in Education, 2021

There is a need for assessment items that assess complex constructs but can also be efficiently scored for evaluation of teacher education programs. In an effort to measure the construct of teacher attentiveness in an efficient and scalable manner, we are using exemplar responses elicited by constructed-response item prompts to develop…

Descriptors: Protocol Analysis, Test Items, Responses, Mathematics Teachers

Does the Response Options Placement Provide Clues to the Correct Answers in Multiple-Choice Tests? A Systematic Review

Peer reviewed

Direct link

Lions, Séverin; Monsalve, Carlos; Dartnell, Pablo; Blanco, María Paz; Ortega, Gabriel; Lemarié, Julie – Applied Measurement in Education, 2022

Multiple-choice tests are widely used in education, often for high-stakes assessment purposes. Consequently, these tests should be constructed following the highest standards. Many efforts have been undertaken to advance item-writing guidelines intended to improve tests. One important issue is the unwanted effects of the options' position on test…

Descriptors: Multiple Choice Tests, High Stakes Tests, Test Construction, Guidelines

Gathering Response Process Data for a Problem-Solving Measure through Whole-Class Think Alouds

Peer reviewed

Direct link

Bostic, Jonathan David; Sondergeld, Toni A.; Matney, Gabriel; Stone, Gregory; Hicks, Tiara – Applied Measurement in Education, 2021

Response process validity evidence provides a window into a respondent's cognitive processing. The purpose of this study is to describe a new data collection tool called a whole-class think aloud (WCTA). This work is performed as part of test development for a series of problem-solving measures to be used in elementary and middle grades. Data from…

Descriptors: Data Collection, Protocol Analysis, Problem Solving, Cognitive Processes

Dissecting Knowledge, Guessing, and Blunder in Multiple Choice Assessments

Peer reviewed

Direct link

Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023

Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…

Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models

Challenges to the Cattell-Horn-Carroll Theory: Empirical, Clinical, and Policy Implications

Peer reviewed

Direct link

Canivez, Gary L.; Youngstrom, Eric A. – Applied Measurement in Education, 2019

The Cattell-Horn-Carroll (CHC) taxonomy of cognitive abilities married John Horn and Raymond Cattell's Extended Gf-Gc theory with John Carroll's Three-Stratum Theory. While there are some similarities in arrangements or classifications of tasks (observed variables) within similar broad or narrow dimensions, other salient theoretical features and…

Descriptors: Taxonomy, Cognitive Ability, Intelligence, Cognitive Tests

Examining How Professional Roles and Test Development Experiences Impact Angoff Ratings

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2018

An important consideration in standard setting is recruiting a group of panelists with different experiences and backgrounds to serve on the standard-setting panel. This study uses data from 14 different Angoff standard settings from a variety of medical imaging credentialing programs to examine whether people with different professional roles and…

Descriptors: Standard Setting (Scoring), Test Construction, Cutting Scores, Accuracy

Prescribing Structure for Validation Arguments: Elemental, Structural, and Ecological Validity

Peer reviewed

Direct link

Jacobson, Erik; Svetina, Dubravka – Applied Measurement in Education, 2019

Contingent argument-based approaches to validity require a unique argument for each use, in contrast to more prescriptive approaches that identify the common kinds of validity evidence researchers should consider for every use. In this article, we evaluate our use of an approach that is both prescriptive "and" argument-based to develop a…

Descriptors: Test Validity, Test Items, Test Construction, Test Interpretation

Formative Assessment of Computational Thinking: Cognitive and Metacognitive Processes

Peer reviewed

Direct link

Bonner, Sarah; Chen, Peggy; Jones, Kristi; Milonovich, Brandon – Applied Measurement in Education, 2021

We describe the use of think alouds to examine substantive processes involved in performance on a formative assessment of computational thinking (CT) designed to support self-regulated learning (SRL). Our task design model included three phases of work on a computational thinking problem: forethought, performance, and reflection. The cognitive…

Descriptors: Formative Evaluation, Thinking Skills, Metacognition, Computer Science Education

Developing an Automated Writing Placement System for ESL Learners

Peer reviewed

Direct link

Yannakoudakis, Helen; Andersen, Øistein E.; Geranpayeh, Ardeshir; Briscoe, Ted; Nicholls, Diane – Applied Measurement in Education, 2018

There are quite a few challenges in the development of an automated writing placement model for non-native English learners, among them the fact that exams that encompass the full range of language proficiency exhibited at different stages of learning are hard to design. However, acquisition of appropriate training data that are relevant to the…

Descriptors: Automation, Data Processing, Student Placement, English Language Learners

Designing, Evaluating, and Deploying Automated Scoring Systems with Validity in Mind: Methodological Design Decisions

Peer reviewed

Direct link

Rupp, André A. – Applied Measurement in Education, 2018

This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…

Descriptors: Design, Automation, Scoring, Test Scoring Machines

The Effect of Anchor Test Construction on Scale Drift

Peer reviewed

Direct link

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…

Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Downing, Steven M.	5
Haladyna, Thomas M.	4
Ercikan, Kadriye	3
Feldt, Leonard S.	3
Hambleton, Ronald K.	3
Huff, Kristen	3
Clauser, Brian E.	2
Forsyth, Robert A.	2
Frary, Robert B.	2
Gierl, Mark J.	2
Leighton, Jacqueline P.	2
Linn, Robert L.	2
Luecht, Richard	2
Martinez, Michael E.	2
Mehrens, William A.	2
Shavelson, Richard J.	2
Sireci, Stephen G.	2
Twing, Jon S.	2
Way, Walter D.	2
Abu-Ghazalah, Rashid M.	1
Ainley, John	1
Andersen, Øistein E.	1
Anne Traynor	1
Antal, Judit	1
More ▼