ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	20

Descriptor

Scores	25
Test Items	25
Item Response Theory	6
Psychometrics	5
Test Construction	5
Educational Assessment	4
Test Interpretation	4
Test Validity	4
Evaluation Methods	3
Foreign Countries	3
Models	3
Multiple Choice Tests	3
Reliability	3
Standardized Tests	3
Test Format	3
Test Use	3
Test Wiseness	3
Testing Problems	3
Tests	3
Validity	3
Academic Achievement	2
Achievement Tests	2
COVID-19	2
College Entrance Examinations	2
Comparative Analysis	2
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	25
Reports - Research	11
Reports - Evaluative	8
Reports - Descriptive	5
Guides - Non-Classroom	1

Education Level

Higher Education	3
Postsecondary Education	3
Secondary Education	2
Elementary Secondary Education	1

Audience

Location

Canada

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Program for International…	2
ACT Assessment	1
Graduate Record Examinations	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Reconceptualization of Coefficient Alpha Reliability for Test Summed and Scaled Scores

Peer reviewed

Direct link

Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022

Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…

Descriptors: Reliability, Scores, Scaling, Statistical Analysis

Reporting Pass-Fail Decisions to Examinees with Incomplete Data: A Commentary on Feinberg (2021)

Peer reviewed

Direct link

Sinharay, Sandip – Educational Measurement: Issues and Practice, 2022

Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores, and hence to incomplete data, on credentialing tests such as the United States Medical Licensing examination. Feinberg compared four approaches for reporting pass-fail decisions to the examinees with incomplete data on credentialing…

Descriptors: Testing Problems, High Stakes Tests, Credentials, Test Items

Supporting the Interpretive Validity of Student-Level Claims in Science Assessment with Tiered Claim Structures

Peer reviewed

Direct link

Student, Sanford R.; Gong, Brian – Educational Measurement: Issues and Practice, 2022

We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from…

Descriptors: Science Tests, Test Validity, Test Items, Test Construction

Disrupted Data: Using Longitudinal Assessment Systems to Monitor Test Score Quality

Peer reviewed

Direct link

An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022

Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…

Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies

Mode Effects in College Admissions Testing and Differential Speededness as a Possible Explanation

Peer reviewed

Direct link

Steedle, Jeffrey T.; Cho, Young Woo; Wang, Shichao; Arthur, Ann M.; Li, Dongmei – Educational Measurement: Issues and Practice, 2022

As testing programs transition from paper to online testing, they must study mode comparability to support the exchangeability of scores from different testing modes. To that end, a series of three mode comparability studies was conducted during the 2019-2020 academic year with examinees randomly assigned to take the ACT college admissions exam on…

Descriptors: College Entrance Examinations, Computer Assisted Testing, Scores, Test Format

Digital Module 13: Monte Carlo Simulation Studies in Item Response Theory

Peer reviewed

Direct link

Leventhal, Brian; Ames, Allison – Educational Measurement: Issues and Practice, 2020

In this digital ITEMS module, Dr. Brian Leventhal and Dr. Allison Ames provide an overview of "Monte Carlo simulation studies" (MCSS) in "item response theory" (IRT). MCSS are utilized for a variety of reasons, one of the most compelling being that they can be used when analytic solutions are impractical or nonexistent because…

Descriptors: Item Response Theory, Monte Carlo Methods, Simulation, Test Items

How Well Does the Sum Score Summarize the Test? Summability as a Measure of Internal Consistency

Peer reviewed

Direct link

Goeman, J. J.; De Jong, N. H. – Educational Measurement: Issues and Practice, 2018

Many researchers use Cronbach's alpha to demonstrate internal consistency, even though it has been shown numerous times that Cronbach's alpha is not suitable for this. Because the intention of questionnaire and test constructers is to summarize the test by its overall sum score, we advocate summability, which we define as the proportion of total…

Descriptors: Tests, Scores, Questionnaires, Measurement

Evaluating Item Fit Statistic Thresholds in PISA: Analysis of Cross-Country Comparability of Cognitive Items

Peer reviewed

Direct link

Joo, Seang-Hwane; Khorramdel, Lale; Yamamoto, Kentaro; Shin, Hyo Jeong; Robin, Frederic – Educational Measurement: Issues and Practice, 2021

In Programme for International Student Assessment (PISA), item response theory (IRT) scaling is used to examine the psychometric properties of items and scales and to provide comparable test scores across participating countries and over time. To balance the comparability of IRT item parameter estimations across countries with the best possible…

Descriptors: Foreign Countries, International Assessment, Achievement Tests, Secondary School Students

How Can Released State Test Items Support Interim Assessment Purposes in an Educational Crisis?

Peer reviewed

Direct link

Klugman, Emma M.; Ho, Andrew D. – Educational Measurement: Issues and Practice, 2020

State testing programs regularly release previously administered test items to the public. We provide an open-source recipe for state, district, and school assessment coordinators to combine these items flexibly to produce scores linked to established state score scales. These would enable estimation of student score distributions and achievement…

Descriptors: Testing Programs, State Programs, Test Items, Scores

Impact of Both Local Item Dependencies and Cut-Point Locations on Examinee Classifications

Peer reviewed

Direct link

Rubright, Jonathan D. – Educational Measurement: Issues and Practice, 2018

Performance assessments, scenario-based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this…

Descriptors: Performance Based Assessment, Item Response Theory, Models, Test Reliability

Affordances of Item Formats and Their Effects on Test-Taker Cognition under Uncertainty

Peer reviewed

Direct link

Moon, Jung Aa; Keehner, Madeleine; Katz, Irvin R. – Educational Measurement: Issues and Practice, 2019

The current study investigated how item formats and their inherent affordances influence test-takers' cognition under uncertainty. Adult participants solved content-equivalent math items in multiple-selection multiple-choice and four alternative grid formats. The results indicated that participants' affirmative response tendency (i.e., judge the…

Descriptors: Affordances, Test Items, Test Format, Test Wiseness

When Can We Improve Subscores by Making Them Shorter?: The Case against Subscores with Overlapping Items

Peer reviewed

Direct link

Feinberg, Richard A.; Wainer, Howard – Educational Measurement: Issues and Practice, 2014

Subscores can be of diagnostic value for tests that cover multiple underlying traits. Some items require knowledge or ability that spans more than a single trait. It is thus natural for such items to be included on more than a single subscore. Subscores only have value if they are reliable enough to justify conclusions drawn from them and if they…

Descriptors: Scores, Test Items, Reliability

Validating Automated Measures of Text Complexity

Peer reviewed

Direct link

Sheehan, Kathleen M. – Educational Measurement: Issues and Practice, 2017

Automated text complexity measurement tools (also called readability metrics) have been proposed as a way to help teachers, textbook publishers, and assessment developers select texts that are closely aligned with the new, more demanding text complexity expectations specified in the Common Core State Standards. This article examines a critical…

Descriptors: Reading Material Selection, Difficulty Level, Common Core State Standards, Validity

How Robust Are Cross-Country Comparisons of PISA Scores to the Scaling Model Used?

Peer reviewed

Direct link

Jerrim, John; Parker, Philip; Choi, Alvaro; Chmielewski, Anna Katyn; Sälzer, Christine; Shure, Nikki – Educational Measurement: Issues and Practice, 2018

The Programme for International Student Assessment (PISA) is an important international study of 15-olds' knowledge and skills. New results are released every 3 years, and have a substantial impact upon education policy. Yet, despite its influence, the methodology underpinning PISA has received significant criticism. Much of this criticism has…

Descriptors: Educational Assessment, Comparative Education, Achievement Tests, Foreign Countries

Repeat Testing Effects on Credentialing Exams: Are Repeaters Misinformed or Uninformed?

Peer reviewed

Direct link

Feinberg, Richard A.; Raymond, Mark R.; Haist, Steven A. – Educational Measurement: Issues and Practice, 2015

To mitigate security concerns and unfair score gains, credentialing programs routinely administer new test material to examinees retesting after an initial failing attempt. Counterintuitively, a small but growing body of recent research suggests that repeating the identical form does not create an unfair advantage. This study builds upon and…

Descriptors: Licensing Examinations (Professions), Repetition, Testing, Responses

Previous Page | Next Page »

Pages: 1 | 2

Feinberg, Richard A.	2
Sireci, Stephen G.	2
Almehrizi, Rashid S.	1
Ames, Allison	1
An, Lily Shiao	1
Armstrong, Anne-Marie	1
Arthur, Ann M.	1
Bock, R. Darrell	1
Bridgeman, Brent	1
Childs, Ruth A.	1
Chmielewski, Anna Katyn	1
Cho, Young Woo	1
Choi, Alvaro	1
Cui, Ying	1
Davis, Laurie Laughlin	1
De Jong, N. H.	1
Gattamorta, Karina	1
Goeman, J. J.	1
Gong, Brian	1
Haist, Steven A.	1
Hills, John R.	1
Hiscox, Michael D.	1
Ho, Andrew D.	1
Ho, Andrew Dean	1
Jerrim, John	1
More ▼