ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	12
Since 2006 (last 20 years)	19

Descriptor

Comparative Analysis	33
Scores	9
Test Items	8
Test Use	8
Test Construction	7
Test Format	7
Educational Assessment	6
College Entrance Examinations	5
Elementary Secondary Education	5
Equated Scores	5
Foreign Countries	5
Models	5
Simulation	5
Evaluation Methods	4
Item Response Theory	4
Mathematics Tests	4
Testing Problems	4
Academic Achievement	3
Achievement Tests	3
Cross Cultural Studies	3
Educational Testing	3
English (Second Language)	3
High Stakes Tests	3
International Studies	3
Longitudinal Studies	3
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	33
Reports - Research	14
Reports - Evaluative	11
Reports - Descriptive	7
Guides - Non-Classroom	1
Information Analyses	1
Opinion Papers	1
Speeches/Meeting Papers	1

Education Level

Higher Education	4
Elementary Secondary Education	2
Postsecondary Education	2
Secondary Education	2
Early Childhood Education	1
Elementary Education	1
Grade 3	1
Grade 4	1
High Schools	1

Audience

Location

Israel	2
Asia	1
Canada	1
Florida	1
Ireland	1
Netherlands	1
South Carolina	1
Sweden	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Graduate Record Examinations	1
National Assessment of…	1
Program for International…	1
SAT (College Admission Test)	1
Test of English as a Foreign…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 33 results Save | Export

The Role of Response Style Adjustments in Cross-Country Comparisons--A Case Study Using Data from the PISA 2015 Questionnaire

Peer reviewed

Direct link

Ulitzsch, Esther; Lüdtke, Oliver; Robitzsch, Alexander – Educational Measurement: Issues and Practice, 2023

Country differences in response styles (RS) may jeopardize cross-country comparability of Likert-type scales. When adjusting for rather than investigating RS is the primary goal, it seems advantageous to impose minimal assumptions on RS structures and leverage information from multiple scales for RS measurement. Using PISA 2015 background…

Descriptors: Response Style (Tests), Comparative Analysis, Achievement Tests, Foreign Countries

A Longitudinal Diagnostic Model with Hierarchical Learning Trajectories

Peer reviewed

Direct link

Zhan, Peida; He, Keren – Educational Measurement: Issues and Practice, 2021

In learning diagnostic assessments, the attribute hierarchy specifies a sequential network of interrelated attribute mastery processes, which makes a test blueprint consistent with the cognitive theory. One of the most important functions of attribute hierarchy is to guide or limit the developmental direction of students and then form a…

Descriptors: Longitudinal Studies, Models, Comparative Analysis, Diagnostic Tests

On the Choice of Anchor Tests in Equating

Peer reviewed

Direct link

Sinharay, Sandip – Educational Measurement: Issues and Practice, 2018

The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested "miditests," which are anchor tests that are content-representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties.…

Descriptors: Test Content, Difficulty Level, Test Items, Test Construction

Systematic Comparison of Decision Accuracy of Complex Compensatory Decision Rules Combining Multiple Tests in a Higher Education Context

Peer reviewed

Direct link

Yocarini, Iris E.; Bouwmeester, Samantha; Smeets, Guus; Arends, Lidia R. – Educational Measurement: Issues and Practice, 2018

This real-data-guided simulation study systematically evaluated the decision accuracy of complex decision rules combining multiple tests within different realistic curricula. Specifically, complex decision rules combining conjunctive aspects and compensatory aspects were evaluated. A conjunctive aspect requires a minimum level of performance,…

Descriptors: Comparative Analysis, Decision Making, Accuracy, Higher Education

Reliably Assessing Growth with Longitudinal Diagnostic Classification Models

Peer reviewed

Direct link

Madison, Matthew J. – Educational Measurement: Issues and Practice, 2019

Recent advances have enabled diagnostic classification models (DCMs) to accommodate longitudinal data. These longitudinal DCMs were developed to study how examinees change, or transition, between different attribute mastery statuses over time. This study examines using longitudinal DCMs as an approach to assessing growth and serves three purposes:…

Descriptors: Longitudinal Studies, Item Response Theory, Psychometrics, Criterion Referenced Tests

A Comparison of Two Alternate Scaling Approaches Employed for Task Analyses in Credentialing Examination Development

Peer reviewed

Direct link

Fidler, James R.; Risk, Nicole M. – Educational Measurement: Issues and Practice, 2019

Credentialing examination developers rely on task (job) analyses for establishing inventories of task and knowledge areas in which competency is required for safe and successful practice in target occupations. There are many ways in which task-related information may be gathered from practitioner ratings, each with its own advantage and…

Descriptors: Job Analysis, Scaling, Licensing Examinations (Professions), Test Construction

Can a Two-Question Test Be Reliable and Valid for Predicting Academic Outcomes?

Peer reviewed

Direct link

Bridgeman, Brent – Educational Measurement: Issues and Practice, 2016

Scores on essay-based assessments that are part of standardized admissions tests are typically given relatively little weight in admissions decisions compared to the weight given to scores from multiple-choice assessments. Evidence is presented to suggest that more weight should be given to these assessments. The reliability of the writing scores…

Descriptors: Multiple Choice Tests, Scores, Standardized Tests, Comparative Analysis

Five Methods for Estimating Angoff Cut Scores with IRT

Peer reviewed

Direct link

Wyse, Adam E. – Educational Measurement: Issues and Practice, 2017

This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…

Descriptors: Cutting Scores, Item Response Theory, Bayesian Statistics, Maximum Likelihood Statistics

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Predicting College Performance of Homeschooled versus Traditional Students

Peer reviewed

Direct link

Yu, Martin C.; Sackett, Paul R.; Kuncel, Nathan R. – Educational Measurement: Issues and Practice, 2016

The prevalence of homeschooling in the United States is increasing. Yet little is known about how commonly used predictors of postsecondary academic performance (SAT, high school grade point average [HSGPA]) perform for homeschooled students. Postsecondary performance at 140 colleges and universities was analyzed comparing a sample of traditional…

Descriptors: Predictor Variables, Academic Achievement, College Students, Home Schooling

An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

Peer reviewed

Direct link

Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol – Educational Measurement: Issues and Practice, 2016

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…

Descriptors: Test Bias, Research Methodology, Evaluation Methods, Models

Predicting Freshman Grade-Point Average from Test Scores: Effects of Variation within and between High Schools

Peer reviewed

Direct link

Koretz, D.; Langi, M. – Educational Measurement: Issues and Practice, 2018

Most studies predicting college performance from high-school grade point average (HSGPA) and college admissions test scores use single-level regression models that conflate relationships within and between high schools. Because grading standards vary among high schools, these relationships are likely to differ within and between schools. We used…

Descriptors: Prediction, High School Students, Grade Point Average, Scores

Quantifying Error and Uncertainty Reductions in Scaling Functions: An ITEMS Module

Peer reviewed

Direct link

Moses, Tim – Educational Measurement: Issues and Practice, 2014

This module describes and extends X-to-Y regression measures that have been proposed for use in the assessment of X-to-Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and…

Descriptors: Scaling, Regression (Statistics), Equated Scores, Comparative Analysis

Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests

Peer reviewed

Direct link

Kolen, Michael J.; Lee, Won-Chan – Educational Measurement: Issues and Practice, 2011

This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…

Descriptors: Test Use, Test Format, Error of Measurement, Raw Scores

Does an Argument-Based Approach to Validity Make a Difference?

Peer reviewed

Direct link

Chapelle, Carol A.; Enright, Mary K.; Jamieson, Joan – Educational Measurement: Issues and Practice, 2010

Drawing on experience between 2000 and 2007 in developing a validity argument for the high-stakes Test of English as a "Foreign Language[TM]" (TOEFL[R]), this paper evaluates the differences between the argument-based approach to validity as presented by "Kane (2006)" and that described in the 1999 "AERA/APA/NCME Standards for Educational and…

Descriptors: Psychological Testing, Validity, High Stakes Tests, English (Second Language)

Previous Page | Next Page »

Pages: 1 | 2 | 3

Allen, Jessica	1
Arends, Lidia R.	1
Bouwmeester, Samantha	1
Bridgeman, Brent	1
Buckendahl, Chad W.	1
Cameron, Catherine A.	1
Chapelle, Carol A.	1
Cho, Sun-Joo	1
Downing, Steven M.	1
Dunn, Jennifer L.	1
Eignor, Daniel R.	1
Emick, Jessica E.	1
Enright, Mary K.	1
Fennessey, James	1
Feuer, Michael J.	1
Fidler, James R.	1
Fulton, Kathleen	1
Green, Bert F.	1
Hambleton, Ronald K.	1
He, Keren	1
Hipolito-Delgado, Carlos…	1
Ho, Andrew D.	1
Hoover, H. D.	1
Impara, James C.	1
More ▼