ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	13

Descriptor

Correlation	13
Scores	5
Test Items	4
Difficulty Level	3
Item Response Theory	3
Mathematics Tests	3
Achievement Tests	2
Comparative Analysis	2
Equated Scores	2
Foreign Countries	2
Inferences	2
Measures (Individuals)	2
Models	2
Rating Scales	2
Reaction Time	2
Test Construction	2
Test Interpretation	2
Test Length	2
Test Use	2
Test Validity	2
Tests	2
Ability	1
Academic Ability	1
Academic Achievement	1
Accuracy	1
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	13
Reports - Research	8
Reports - Evaluative	3
Reports - Descriptive	2
Information Analyses	1

Education Level

Secondary Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 8	1
High Schools	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1

Audience

Location

Australia	1
Canada	1
Finland	1
Germany	1
Idaho	1
Singapore	1
United Kingdom (England)	1
United States	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Trends in International…

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Examining Gender Differences in TIMSS 2019 Using a Multiple-Group Hierarchical Speed-Accuracy-Revisits Model

Peer reviewed

Direct link

Dihao Leng; Ummugul Bezirhan; Lale Khorramdel; Bethany Fishbein; Matthias von Davier – Educational Measurement: Issues and Practice, 2024

This study capitalizes on response and process data from the computer-based TIMSS 2019 Problem Solving and Inquiry tasks to investigate gender differences in test-taking behaviors and their association with mathematics achievement at the eighth grade. Specifically, a recently proposed hierarchical speed-accuracy-revisits (SAR) model was adapted to…

Descriptors: Gender Differences, Test Wiseness, Achievement Tests, Mathematics Tests

Disrupted Data: Using Longitudinal Assessment Systems to Monitor Test Score Quality

Peer reviewed

Direct link

An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022

Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…

Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies

What Are the Conditions Associated with Subscore Added Value Noninvariance? Implications for Improving Subscore Interpretation Fairness

Peer reviewed

Direct link

Rios, Joseph A.; Miranda, Alejandra A. – Educational Measurement: Issues and Practice, 2021

Subscore added value analyses assume invariance across test taking populations; however, this assumption may be untenable in practice as differential subdomain relationships may be present among subgroups. The purpose of this simulation study was to understand the conditions associated with subscore added value noninvariance when manipulating: (1)…

Descriptors: Scores, Test Length, Ability, Correlation

A Special Case of Brennan's Index for Tests That Aim to Select a Limited Number of Students: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Arikan, Serkan; Aybek, Eren Can – Educational Measurement: Issues and Practice, 2022

Many scholars compared various item discrimination indices in real or simulated data. Item discrimination indices, such as item-total correlation, item-rest correlation, and IRT item discrimination parameter, provide information about individual differences among all participants. However, there are tests that aim to select a very limited number…

Descriptors: Monte Carlo Methods, Item Analysis, Correlation, Individual Differences

On the Choice of Anchor Tests in Equating

Peer reviewed

Direct link

Sinharay, Sandip – Educational Measurement: Issues and Practice, 2018

The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested "miditests," which are anchor tests that are content-representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties.…

Descriptors: Test Content, Difficulty Level, Test Items, Test Construction

Can Item Response Times Provide Insight into Students' Motivation and Self-Efficacy in Math? An Initial Application of Test Metadata to Understand Students' Social-Emotional Needs

Peer reviewed

Direct link

Soland, James – Educational Measurement: Issues and Practice, 2019

As computer-based tests become more common, there is a growing wealth of metadata related to examinees' response processes, which include solution strategies, concentration, and operating speed. One common type of metadata is item response time. While response times have been used extensively to improve estimates of achievement, little work…

Descriptors: Test Items, Item Response Theory, Metadata, Self Efficacy

A Review of Recent Research on Individual-Level Score Reports

Peer reviewed

Direct link

Gotch, Chad M.; Roduta Roberts, Mary – Educational Measurement: Issues and Practice, 2018

As the primary interface between test developers and multiple educational stakeholders, score reports are a critical component to the success (or failure) of any assessment program. The purpose of this review is to document recent research on individual-level score reporting to advance the research and practice of score reporting. We conducted a…

Descriptors: Scores, Models, Correlation, Stakeholders

Synergy and Tension between Large-Scale and Classroom Assessment: International Trends

Peer reviewed

Direct link

Volante, Louis; DeLuca, Christopher; Adie, Lenore; Baker, Eva; Harju-Luukkainen, Heidi; Heritage, Margaret; Schneider, Christoph; Stobart, Gordon; Tan, Kelvin; Wyatt-Smith, Claire – Educational Measurement: Issues and Practice, 2020

The synergy, or lack thereof, between large-scale and classroom assessment has been fiercely debated in both academic and policy spheres for decades around the world. This paper seeks to explicate how different countries are utilizing large-scale testing and test results at the classroom level. Through country profiles, this paper analyzes…

Descriptors: Educational Trends, Trend Analysis, Measurement, Teaching Methods

Studying the Relationships between the Number of APs, AP Performance, and College Outcomes

Peer reviewed

Direct link

Beard, Jonathan J.; Hsu, Julian; Ewing, Maureen; Godfrey, Kelly E. – Educational Measurement: Issues and Practice, 2019

High school students enroll in Advanced Placement (AP) courses and take AP exams for a variety of reasons. However, a lack of information about the extent to which there are incremental benefits associated with taking multiple AP exams has fostered a perception that students must take many APs to be prepared for college. Conversely, many American…

Descriptors: Correlation, Advanced Placement, Tests, College Preparation

Components of Variance of Scales with a Bifactor Subscale Structure from Two Calculations of Alpha

Peer reviewed

Direct link

Andrich, David – Educational Measurement: Issues and Practice, 2016

Since Cronbach's (1951) elaboration of a from its introduction by Guttman (1945), this coefficient has become ubiquitous in characterizing assessment instruments in education, psychology, and other social sciences. Also ubiquitous are caveats on the calculation and interpretation of this coefficient. This article summarizes a recent contribution…

Descriptors: Computation, Correlation, Test Theory, Measures (Individuals)

Quantifying Error and Uncertainty Reductions in Scaling Functions: An ITEMS Module

Peer reviewed

Direct link

Moses, Tim – Educational Measurement: Issues and Practice, 2014

This module describes and extends X-to-Y regression measures that have been proposed for use in the assessment of X-to-Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and…

Descriptors: Scaling, Regression (Statistics), Equated Scores, Comparative Analysis

Application of Latent Trait Models to Identifying Substantively Interesting Raters

Peer reviewed

Direct link

Wolfe, Edward W.; McVay, Aaron – Educational Measurement: Issues and Practice, 2012

Historically, research focusing on rater characteristics and rating contexts that enable the assignment of accurate ratings and research focusing on statistical indicators of accurate ratings has been conducted by separate communities of researchers. This study demonstrates how existing latent trait modeling procedures can identify groups of…

Descriptors: Researchers, Research, Correlation, Test Bias

Building Validity Evidence for Scores on a State-Wide Alternate Assessment: A Contrasting Groups, Multimethod Approach

Peer reviewed

Direct link

Elliott, Stephen N.; Compton, Elizabeth; Roach, Andrew T. – Educational Measurement: Issues and Practice, 2007

The relationships between ratings on the Idaho Alternate Assessment (IAA) for 116 students with significant disabilities and corresponding ratings for the same students on two norm-referenced teacher rating scales were examined to gain evidence about the validity of resulting IAA scores. To contextualize these findings, another group of 54…

Descriptors: Inferences, Disabilities, Rating Scales, Eligibility

Adie, Lenore	1
An, Lily Shiao	1
Andrich, David	1
Arikan, Serkan	1
Aybek, Eren Can	1
Baker, Eva	1
Beard, Jonathan J.	1
Bethany Fishbein	1
Compton, Elizabeth	1
Davis, Laurie Laughlin	1
DeLuca, Christopher	1
Dihao Leng	1
Elliott, Stephen N.	1
Ewing, Maureen	1
Godfrey, Kelly E.	1
Gotch, Chad M.	1
Harju-Luukkainen, Heidi	1
Heritage, Margaret	1
Ho, Andrew Dean	1
Hsu, Julian	1
Lale Khorramdel	1
Matthias von Davier	1
McVay, Aaron	1
Miranda, Alejandra A.	1
Moses, Tim	1
More ▼