ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	12

Source

Educational Measurement:…

Publication Type

Journal Articles	14
Reports - Research	8
Opinion Papers	3
Reports - Descriptive	3
Reports - Evaluative	3

Education Level

Higher Education	2
Adult Education	1
Elementary Secondary Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Germany

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Revisiting the Usage of Alpha in Scale Evaluation: Effects of Scale Length and Sample Size

Peer reviewed

Direct link

Leifeng Xiao; Kit-Tai Hau; Melissa Dan Wang – Educational Measurement: Issues and Practice, 2024

Short scales are time-efficient for participants and cost-effective in research. However, researchers often mistakenly expect short scales to have the same reliability as long ones without considering the effect of scale length. We argue that applying a universal benchmark for alpha is problematic as the impact of low-quality items is greater on…

Descriptors: Measurement, Benchmarking, Item Sampling, Sample Size

Applying a Mixture Rasch Model-Based Approach to Standard Setting

Peer reviewed

Direct link

Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…

Descriptors: Item Response Theory, Standard Setting, Testing, Sampling

Evaluating Population Invariance of Test Equating during the COVID-19 Pandemic

Peer reviewed

Direct link

Li, Dongmei; Kapoor, Shalini – Educational Measurement: Issues and Practice, 2022

Population invariance is a desirable property of test equating which might not hold when significant changes occur in the test population, such as those brought about by the COVID-19 pandemic. This research aims to investigate whether equating functions are reasonably invariant when the test population is impacted by the pandemic. Based on…

Descriptors: Test Items, Equated Scores, COVID-19, Pandemics

Adjusting for Ability Differences of Equating Samples When Randomization Is Suboptimal

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael E. – Educational Measurement: Issues and Practice, 2022

Test equating requires collecting data to link the scores from different forms of a test. Problems arise when equating samples are not equivalent and the test forms to be linked share no common items by which to measure or adjust for the group nonequivalence. Using data from five operational test forms, we created five pairs of research forms for…

Descriptors: Ability, Tests, Equated Scores, Testing Problems

Boolean Analysis of Interobserver Agreement: Formal and Functional Evidence Sampling in Complex Coding Endeavors

Peer reviewed

Direct link

Solano-Flores, Guillermo – Educational Measurement: Issues and Practice, 2021

This article proposes a Boolean approach to representing and analyzing interobserver agreement in dichotomous coding. Building on the notion that observations are samples of a universe of observations, it submits that coding can be viewed as a process in which observers sample pieces of evidence on constructs. It distinguishes between formal and…

Descriptors: Online Searching, Coding, Interrater Reliability, Evidence

Machine Learning and Small Data

Peer reviewed

Direct link

Cui, Zhongmin – Educational Measurement: Issues and Practice, 2021

Commonly used machine learning applications seem to relate to big data. This article provides a gentle review of machine learning and shows why machine learning can be applied to small data too. An example of applying machine learning to screen irregularity reports is presented. In the example, the support vector machine and multinomial naïve…

Descriptors: Artificial Intelligence, Man Machine Systems, Data, Bayesian Statistics

Measuring Textbook Content Coverage: Efficient Content Analysis with Lesson Sampling

Peer reviewed

Direct link

Zhang, Jiahui; Cogan, Leland S.; Schmidt, William H. – Educational Measurement: Issues and Practice, 2020

This study addresses measurement issues around a standards-based content analysis of mathematics textbooks' coverage of standards for use in large-scale monitoring of standards implementation as proposed in a 2013 report by the National Research Council. An earlier study produced an exhaustive content analysis of textbooks using the 2012 Common…

Descriptors: Textbook Content, Academic Standards, Mathematics Curriculum, Content Analysis

Gender-Based Differential Prediction by Curriculum Samples for College Admissions

Peer reviewed

Direct link

Niessen, A. Susan M.; Meijer, Rob R.; Tendeiro, Jorge N. – Educational Measurement: Issues and Practice, 2019

A longstanding concern about admissions to higher education is the underprediction of female academic performance by admission test scores. One explanation for these findings is selection system bias, that is, not all relevant KSAOs that are related to academic performance and gender are included in the prediction model. One solution to this…

Descriptors: College Admission, High Stakes Tests, Gender Differences, Sampling

Setting Standards for English Foreign Language Assessment: Methodology, Validation, and a Degree of Arbitrariness

Peer reviewed

Direct link

Tiffin-Richards, Simon P.; Pant, Hans Anand; Koller, Olaf – Educational Measurement: Issues and Practice, 2013

Cut-scores were set by expert judges on assessments of reading and listening comprehension of English as a foreign language (EFL), using the bookmark standard-setting method to differentiate proficiency levels defined by the Common European Framework of Reference (CEFR). Assessments contained stratified item samples drawn from extensive item…

Descriptors: Foreign Countries, English (Second Language), Language Tests, Standard Setting (Scoring)

NCME 2008 Presidential Address: The Impact of Anchor Test Configuration on Student Proficiency Rates

Peer reviewed

Direct link

Fitzpatrick, Anne R. – Educational Measurement: Issues and Practice, 2008

Examined in this study were the effects of reducing anchor test length on student proficiency rates for 12 multiple-choice tests administered in an annual, large-scale, high-stakes assessment. The anchor tests contained 15 items, 10 items, or five items. Five content representative samples of items were drawn at each anchor test length from a…

Descriptors: Test Length, Multiple Choice Tests, Item Sampling, Student Evaluation

Measurement, Sampling, and Equating Errors in Large-Scale Assessments

Peer reviewed

Direct link

Wu, Margaret – Educational Measurement: Issues and Practice, 2010

In large-scale assessments, such as state-wide testing programs, national sample-based assessments, and international comparative studies, there are many steps involved in the measurement and reporting of student achievement. There are always sources of inaccuracies in each of the steps. It is of interest to identify the source and magnitude of…

Descriptors: Testing Programs, Educational Assessment, Measures (Individuals), Program Effectiveness

An NCME Instructional Module on Estimating Item Response Theory Models Using Markov Chain Monte Carlo Methods

Peer reviewed

Direct link

Kim, Jee-Seon; Bolt, Daniel M. – Educational Measurement: Issues and Practice, 2007

The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain…

Descriptors: Placement, Monte Carlo Methods, Markov Processes, Measurement

Facts about Samples, Fantasies about Domains.

Peer reviewed

Mehrens, William A. – Educational Measurement: Issues and Practice, 1991

Cohen and Hyman's response contains several misunderstandings of the original article by Mehrens and Kaminski. One frequently wishes to make inferences to a domain from a test, but teaching a specific performance and testing for that performance does not allow for a domain inference. (SLD)

Descriptors: Cheating, Criterion Referenced Tests, Educational Assessment, Inferences

Selection of Judges for Standard-Setting.

Peer reviewed

Jaeger, Richard M. – Educational Measurement: Issues and Practice, 1991

Issues concerning the selection of judges for standard setting are discussed. Determining the consistency of judges' recommendations, or their congruity with other expert recommendations, would help in selection. Enough judges must be chosen to allow estimation of recommendations by an entire population of judges. (SLD)

Descriptors: Cutting Scores, Evaluation Methods, Evaluators, Examiners

Sampling	12
Equated Scores	4
Evaluation Methods	4
Test Items	4
Cutting Scores	3
Measurement	3
Academic Achievement	2
Accuracy	2
Bayesian Statistics	2
Coding	2
Educational Assessment	2
Identification	2
Inferences	2
Interrater Reliability	2
Item Response Theory	2
Item Sampling	2
Measurement Techniques	2
Program Effectiveness	2
Sample Size	2
Selection	2
Standard Setting (Scoring)	2
Test Interpretation	2
Testing Problems	2
Ability	1
Academic Standards	1
More ▼

Bolt, Daniel M.	1
Cogan, Leland S.	1
Cui, Zhongmin	1
Fitzpatrick, Anne R.	1
Jaeger, Richard M.	1
Kapoor, Shalini	1
Kim, Jee-Seon	1
Kim, Sooyeon	1
Kit-Tai Hau	1
Koller, Olaf	1
Leifeng Xiao	1
Li, Dongmei	1
Mehrens, William A.	1
Meijer, Rob R.	1
Melissa Dan Wang	1
Meng, Yu	1
Muckle, Timothy J.	1
Niessen, A. Susan M.	1
Pant, Hans Anand	1
Peabody, Michael R.	1
Schmidt, William H.	1
Solano-Flores, Guillermo	1
Tendeiro, Jorge N.	1
Tiffin-Richards, Simon P.	1
Walker, Michael E.	1
More ▼