ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	10

Descriptor

Error of Measurement	13
Test Items	13
Testing	13
Item Response Theory	5
Difficulty Level	4
Item Analysis	4
Simulation	4
Statistical Analysis	3
Computer Assisted Testing	2
English (Second Language)	2
Equated Scores	2
Evaluation Methods	2
Generalizability Theory	2
High Stakes Tests	2
Item Banks	2
Psychometrics	2
Scoring	2
Second Language Learning	2
Test Construction	2
Test Reliability	2
Ability Grouping	1
Academic Achievement	1
Achievement Tests	1
Adaptive Testing	1
Adults	1
More ▼

Source

Behavioral Research and…	1
ETS Research Report Series	1
Educational Assessment	1
Educational Measurement:…	1
Educational Researcher	1
Educational and Psychological…	1
International Journal of…	1
Journal of Educational…	1
Online Submission	1
Practical Assessment,…	1
School Psychology Review	1
Teaching in Higher Education	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	8
Reports - Evaluative	4
Numerical/Quantitative Data	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Higher Education	2
Junior High Schools	2
Middle Schools	2
Secondary Education	2
Elementary Education	1
Grade 4	1
Grade 5	1
Grade 7	1
Intermediate Grades	1
Postsecondary Education	1

Audience

Location

Turkey

Laws, Policies, & Programs

Assessments and Surveys

Expressive One Word Picture…

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Impacts of Differences in Group Abilities and Anchor Test Features on Three Non-IRT Test Equating Methods

Peer reviewed
PDF on ERIC

Download full text

Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024

The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…

Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests

Comparison of Kernel Equating Methods under NEAT and NEC Designs

Peer reviewed
PDF on ERIC

Download full text

Ozsoy, Seyma Nur; Kilmen, Sevilay – International Journal of Assessment Tools in Education, 2023

In this study, Kernel test equating methods were compared under NEAT and NEC designs. In NEAT design, Kernel post-stratification and chain equating methods taking into account optimal and large bandwidths were compared. In the NEC design, gender and/or computer/tablet use was considered as a covariate, and Kernel test equating methods were…

Descriptors: Equated Scores, Testing, Test Items, Statistical Analysis

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Monitoring Items in Real Time to Enhance CAT Security

Peer reviewed

Direct link

Zhang, Jinming; Li, Jie – Journal of Educational Measurement, 2016

An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…

Descriptors: Computer Assisted Testing, Test Items, Difficulty Level, Item Response Theory

Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

Peer reviewed

Direct link

Kim, Jihye; Oshima, T. C. – Educational and Psychological Measurement, 2013

In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…

Descriptors: Test Bias, Test Items, Statistical Analysis, Error of Measurement

This Is Only a Test: A Machine-Graded Improvement to the Multiple-Choice and True-False Examination

Peer reviewed

Direct link

McAllister, Daniel; Guidice, Rebecca M. – Teaching in Higher Education, 2012

The primary goal of teaching is to successfully facilitate learning. Testing can help accomplish this goal in two ways. First, testing can provide a powerful motivation for students to prepare when they perceive that the effort involved leads to valued outcomes. Second, testing can provide instructors with valuable feedback on whether their…

Descriptors: Testing, Role, Student Motivation, Feedback (Response)

The Development and Technical Adequacy of Seventh-Grade Reading Comprehension Measures in a Progress Monitoring Assessment System. Technical Report #1102

Download full text

Park, Bitnara Jasmine; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2011

This technical report describes the process of development and piloting of reading comprehension measures that are appropriate for seventh-grade students as part of an online progress screening and monitoring assessment system, http://easycbm.com. Each measure consists of an original fictional story of approximately 1,600 to 1,900 words with 20…

Descriptors: Reading Comprehension, Reading Tests, Grade 7, Test Construction

Same-Form Retest Effects on Credentialing Examinations

Peer reviewed

Direct link

Raymond, Mark R.; Neustel, Sandra; Anderson, Dan – Educational Measurement: Issues and Practice, 2009

Examinees who take high-stakes assessments are usually given an opportunity to repeat the test if they are unsuccessful on their initial attempt. To prevent examinees from obtaining unfair score increases by memorizing the content of specific test items, testing agencies usually assign a different test form to repeat examinees. The use of multiple…

Descriptors: Test Results, Test Items, Testing, Aptitude Tests

Language Variation and Score Variation in the Testing of English Language Learners, Native Spanish Speakers

Peer reviewed

Direct link

Solano-Flores, Guillermo; Li, Min – Educational Assessment, 2009

We investigated language variation and score variation in the testing of English language learners, native Spanish speakers. We gave students the same set of National Assessment of Educational Progress mathematics items in both their first language and their second language. We examined the amount of score variation due to the main and interaction…

Descriptors: Scores, Testing, Second Language Learning, English (Second Language)

Who Is Given Tests in What Language by Whom, When, and Where? The Need for Probabilistic Views of Language in the Testing of English Language Learners

Peer reviewed

Direct link

Solano-Flores, Guillermo – Educational Researcher, 2008

The testing of English language learners (ELLs) is, to a large extent, a random process because of poor implementation and factors that are uncertain or beyond control. Yet current testing practices and policies appear to be based on deterministic views of language and linguistic groups and erroneous assumptions about the capacity of assessment…

Descriptors: Generalizability Theory, Testing, Second Language Learning, Error of Measurement

Evaluation of Linking Methods for Placing Three-Parameter Logistic Item Parameter Estimates onto a One-Parameter Scale

Download full text

Karkee, Thakur B.; Wright, Karen R. – Online Submission, 2004

Different item response theory (IRT) models may be employed for item calibration. Change of testing vendors, for example, may result in the adoption of a different model than that previously used with a testing program. To provide scale continuity and preserve cut score integrity, item parameter estimates from the new model must be linked to the…

Descriptors: Measures (Individuals), Evaluation Criteria, Testing, Integrity

A Discussion of the Expressive One-Word Picture Vocabulary Test.

Peer reviewed

Altepeter, Tom – School Psychology Review, 1983

A critical review of the Expressive One-Word Picture Vocabulary Test (Gardner) is offered. The reviewer feels that the instrument cannot be recommended in its present form. Further research concerning the manual, and theoretical issues, (particularly test-retest stability) is strongly recommended. (Author/PN)

Descriptors: Error of Measurement, Intelligence Tests, Item Analysis, Pictorial Stimuli

The Evaluation of Mastery Test Items. Final Report.

Download full text

Brennan, Robert L. – 1974

The first four chapters of this report primarily provide an extensive, critical review of the literature with regard to selected aspects of the criterion-referenced and mastery testing fields. Major topics treated include: (a) definitions, distinctions, and background, (b) the relevance of classical test theory, (c) validity and procedures for…

Descriptors: Computer Programs, Confidence Testing, Criterion Referenced Tests, Error of Measurement

Solano-Flores, Guillermo	2
Alonzo, Julie	1
Altepeter, Tom	1
Anderson, Dan	1
Brennan, Robert L.	1
Dorans, Neil J.	1
Guidice, Rebecca M.	1
Guo, Hongwen	1
Inga Laukaityte	1
Karkee, Thakur B.	1
Kilmen, Sevilay	1
Kim, Jihye	1
Li, Jie	1
Li, Min	1
Lu, Ru	1
Marie Wiberg	1
McAllister, Daniel	1
Neustel, Sandra	1
Oshima, T. C.	1
Ozsoy, Seyma Nur	1
Park, Bitnara Jasmine	1
Raymond, Mark R.	1
Tindal, Gerald	1
Wright, Karen R.	1
Zhang, Jinming	1
More ▼