ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	5

Descriptor

Comparative Testing	33
Test Items	16
Multiple Choice Tests	10
College Entrance Examinations	8
Computer Assisted Testing	8
Item Response Theory	8
Test Format	8
High School Students	7
High Schools	7
Higher Education	6
Mathematics Tests	6
Adaptive Testing	5
Item Bias	5
Mathematical Models	5
Scores	5
Test Construction	5
Test Validity	5
Black Students	4
College Students	4
Correlation	4
Difficulty Level	4
Item Analysis	4
Scoring	4
White Students	4
Achievement Tests	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	32
Reports - Research	21
Reports - Evaluative	10
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 7	1
Grade 8	1
Higher Education	1
Postsecondary Education	1

Audience

Location

Ireland	2
Israel	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	5
Graduate Record Examinations	4
Advanced Placement…	1
Iowa Tests of Basic Skills	1
Test of Standard Written…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 33 results Save | Export

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Is It Necessary to Make Anchor Tests Mini-Versions of the Tests Being Equated or Can Some Restrictions Be Relaxed?

Peer reviewed

Direct link

Sinharay, Sandip; Holland, Paul W. – Journal of Educational Measurement, 2007

It is a widely held belief that anchor tests should be miniature versions (i.e., "minitests"), with respect to content and statistical characteristics, of the tests being equated. This article examines the foundations for this belief regarding statistical characteristics. It examines the requirement of statistical representativeness of…

Descriptors: Test Items, Comparative Testing

Comparisons among Designs for Equating Mixed-Format Tests in Large-Scale Assessments

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010

In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…

Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias

Generating Dichotomous Item Scores with the Four-Parameter Beta Compound Binomial Model

Peer reviewed

Direct link

Monahan, Patrick O.; Lee, Won-Chan; Ankenmann, Robert D. – Journal of Educational Measurement, 2007

A Monte Carlo simulation technique for generating dichotomous item scores is presented that implements (a) a psychometric model with different explicit assumptions than traditional parametric item response theory (IRT) models, and (b) item characteristic curves without restrictive assumptions concerning mathematical form. The four-parameter beta…

Descriptors: True Scores, Psychometrics, Monte Carlo Methods, Correlation

The Chain and Post-Stratification Methods for Observed-Score Equating: Their Relationship to Population Invariance

Peer reviewed

Direct link

von Davier, Alina A.; Holland, Paul W.; Thayer, Dorothy T. – Journal of Educational Measurement, 2004

The Non-Equivalent-groups Anchor Test (NEAT) design has been in wide use since at least the early 1940s. It involves two populations of test takers, P and Q, and makes use of an anchor test to link them. Two linking methods used for NEAT designs are those (a) based on chain equating and (b) that use the anchor test to post-stratify the…

Descriptors: Equated Scores, Evaluation Research, Comparative Testing, Population Groups

The Effect of Distractions on Sixth-Grade Students in a Testing Situation

Peer reviewed

Trentham, Landa L. – Journal of Educational Measurement, 1975

Descriptors: Comparative Testing, Educational Testing, Elementary Education, Grade 6

Compromise Models for Establishing Examination Standards.

Peer reviewed

Gruijter, Dato N. M. – Journal of Educational Measurement, 1985

To improve on cutoff scores based on absolute standards which may produce an unacceptable number of failures, a compromise is suggested. The compromise draws on the information in the observed score distribution to adjust the standard. Three compromise models developed by Hofstee, Beuk, and De Gruijter are compared. (Author/GDC)

Descriptors: Academic Standards, Comparative Testing, Cutting Scores, Mastery Tests

A Comparison of Item Sampling Plans in the Application of Multiple Matrix Sampling.

Peer reviewed

Gressard, Risa P.; Loyd, Brenda H. – Journal of Educational Measurement, 1991

A Monte Carlo study, which simulated 10,000 examinees' responses to four tests, investigated the effect of item stratification on parameter estimation in multiple matrix sampling of achievement data. Practical multiple matrix sampling is based on item stratification by item discrimination and a sampling plan with moderate number of subtests. (SLD)

Descriptors: Achievement Tests, Comparative Testing, Computer Simulation, Estimation (Mathematics)

A Comparison of Score Level Estimates of the Standard Error of Measurement.

Peer reviewed

Qualls-Payne, Audrey L. – Journal of Educational Measurement, 1992

Six methods for estimating the standard error of measurement (SEM) at specific score levels are compared by comparing score level SEM estimates from a single test administration to estimates from two test administrations, using Iowa Tests of Basic Skills data for 2,138 examinees. L. S. Feldt's method is preferred. (SLD)

Descriptors: Comparative Testing, Elementary Education, Elementary School Students, Error of Measurement

The Utility of Third International Mathematics and Science Study Scales in Predicting Students' State Examination Performance

Peer reviewed

Direct link

Sofroniou, Nick; Kellaghan, Thomas – Journal of Educational Measurement, 2004

To examine the predictive utility of three scales provided in the released database of the Third International Mathematics and Science Study (TIMSS) (international plausible values, standardized percent correct score, and national Rasch score), information was obtained on the performance in state examinations in mathematics and science in 1996…

Descriptors: Foreign Countries, Predictive Validity, National Competency Tests, Mathematics Tests

Exact Small-Sample Differential Item Functioning Methods for Polytomous Items with Illustration Based on an Attitude Survey

Peer reviewed

Direct link

Meyer, J. Patrick; Huynh, Huynh; Seaman, Michael A. – Journal of Educational Measurement, 2004

Exact nonparametric procedures have been used to identify the level of differential item functioning (DIF) in binary items. This study explored the use of exact DIF procedures with items scored on a Likert scale. The results from an attitude survey suggest that the large-sample Cochran-Mantel-Haenszel (CMH) procedure identifies more items as…

Descriptors: Test Bias, Attitude Measures, Surveys, Predictive Validity

Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model

Peer reviewed

Direct link

Wang, Wen-Chung; Wilson, Mark; Shih, Ching-Lin – Journal of Educational Measurement, 2006

This study presents the random-effects rating scale model (RE-RSM) which takes into account randomness in the thresholds over persons by treating them as random-effects and adding a random variable for each threshold in the rating scale model (RSM) (Andrich, 1978). The RE-RSM turns out to be a special case of the multidimensional random…

Descriptors: Item Analysis, Rating Scales, Item Response Theory, Monte Carlo Methods

Using Patterns of Summed Scores in Paper-and-Pencil Tests and Computer-Adaptive Tests to Detect Misfitting Item Score Patterns

Peer reviewed

Direct link

Meijer, Rob R. – Journal of Educational Measurement, 2004

Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…

Descriptors: Probability, Adaptive Testing, Item Response Theory, Scores

Performance Modeling That Integrates Latent Trait and Class Theory.

Peer reviewed

Gitomer, Drew H.; Yamamoto, Kentaro – Journal of Educational Measurement, 1991

A model integrating latent trait and latent class theories in characterizing individual performance on the basis of qualitative understanding is presented. This HYBRID model is illustrated through experiments with 119 Air Force technicians taking a paper-and-pencil test and 136 Air Force technicians taking a computerized test. (SLD)

Descriptors: Comparative Testing, Computer Assisted Testing, Educational Assessment, Item Response Theory

A Comparison of the Performance of Simulated Hierarchical and Linear Testlets.

Peer reviewed

Wainer, Howard; And Others – Journal of Educational Measurement, 1992

Computer simulations were run to measure the relationship between testlet validity and factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution. (Author/SLD)

Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation

Previous Page | Next Page »

Pages: 1 | 2 | 3

Bridgeman, Brent	2
Holland, Paul W.	2
Kellaghan, Thomas	2
Lissitz, Robert W.	2
Wainer, Howard	2
Ankenman, Robert D.	1
Ankenmann, Robert D.	1
Ben-Shakhar, Gershon	1
Bennett, Randy Elliot	1
Bolger, Niall	1
Breland, Hunter M.	1
Chen, Shu-Ying	1
Dorans, Neil J.	1
Feifs, Helmuts	1
Freedle, Roy	1
Gaynor, Judith L.	1
Gerritz, Kalle	1
Gitomer, Drew H.	1
Gressard, Risa P.	1
Gruijter, Dato N. M.	1
Hamid Mohammadi	1
Huynh, Huynh	1
Kim, Sooyeon	1
Kostin, Irene	1
More ▼