ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	27

Descriptor

Error of Measurement	69
Test Items	69
Item Response Theory	28
Estimation (Mathematics)	18
Simulation	15
Scores	12
Statistical Analysis	12
Adaptive Testing	10
Computer Assisted Testing	10
Difficulty Level	10
Test Construction	10
Computation	9
Goodness of Fit	9
Item Bias	9
Sample Size	9
Comparative Analysis	8
Equations (Mathematics)	8
Generalizability Theory	8
Test Reliability	8
Equated Scores	7
Maximum Likelihood Statistics	7
Reliability	7
Evaluation Methods	6
Test Length	6
Ability	5
More ▼

Publication Type

Reports - Evaluative	69
Journal Articles	40
Speeches/Meeting Papers	14
Numerical/Quantitative Data	7
Information Analyses	1
Reports - Research	1
Tests/Questionnaires	1

Education Level

Elementary Secondary Education	3
Grade 5	3
Grade 7	3
Higher Education	3
Postsecondary Education	3
Elementary Education	2
Middle Schools	2
Early Childhood Education	1
Grade 2	1
Grade 3	1
Grade 4	1
Intermediate Grades	1
Junior High Schools	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Location

Canada	1
Haiti	1
Portugal	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	2
National Assessment of…	2
Armed Forces Qualification…	1
Expressive One Word Picture…	1
Graduate Management Admission…	1
SAT (College Admission Test)	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 69 results Save | Export

Interval Estimation of Item Response Probabilities along Studied Latent Dimensions

Peer reviewed

Direct link

Raykov, Tenko; Marcoulides, George A.; Pusic, Martin – Measurement: Interdisciplinary Research and Perspectives, 2021

An interval estimation procedure is discussed that can be used to evaluate the probability of a particular response for a binary or binary scored item at a pre-specified point along an underlying latent continuum. The item is assumed to: (a) be part of a unidimensional multi-component measuring instrument that may contain also polytomous items,…

Descriptors: Item Response Theory, Computation, Probability, Test Items

Detecting Compromised Items Using Information from Secure Items

Peer reviewed

Direct link

Wang, Xi; Liu, Yang – Journal of Educational and Behavioral Statistics, 2020

In continuous testing programs, some items are repeatedly used across test administrations, and statistical methods are often used to evaluate whether items become compromised due to examinees' preknowledge. In this study, we proposed a residual method to detect compromised items when a test can be partitioned into two subsets of items: secure…

Descriptors: Test Items, Information Security, Error of Measurement, Cheating

Evaluating the Wald Test for Item-Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

Peer reviewed

Direct link

de la Torre, Jimmy; Lee, Young-Sun – Journal of Educational Measurement, 2013

This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…

Descriptors: Statistical Analysis, Test Items, Goodness of Fit, Error of Measurement

Standard Error of Linear Observed-Score Equating for the NEAT Design with Nonnormally Distributed Data

Peer reviewed

Direct link

Zu, Jiyun; Yuan, Ke-Hai – Journal of Educational Measurement, 2012

In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…

Descriptors: Sample Size, Equated Scores, Test Items, Error of Measurement

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Applying Rasch Model and Generalizability Theory to Study Modified-Angoff Cut Scores

Peer reviewed

Direct link

Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012

The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…

Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling

Limits on the Accuracy of Linking. Research Report. ETS RR-10-22

Download full text

Haberman, Shelby J. – Educational Testing Service, 2010

Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…

Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy

Analyzing the Reliability of the easyCBM Reading Comprehension Measures: Grade 7. Technical Report #1206

Download full text

Irvin, P. Shawn; Alonzo, Julie; Lai, Cheng-Fei; Park, Bitnara Jasmine; Tindal, Gerald – Behavioral Research and Teaching, 2012

In this technical report, we present the results of a reliability study of the seventh-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…

Descriptors: Reading Comprehension, Testing Programs, Statistical Analysis, Grade 7

A Control Systems Concept Inventory Test Design and Assessment

Peer reviewed

Direct link

Bristow, M.; Erkorkmaz, K.; Huissoon, J. P.; Jeon, Soo; Owen, W. S.; Waslander, S. L.; Stubley, G. D. – IEEE Transactions on Education, 2012

Any meaningful initiative to improve the teaching and learning in introductory control systems courses needs a clear test of student conceptual understanding to determine the effectiveness of proposed methods and activities. The authors propose a control systems concept inventory. Development of the inventory was collaborative and iterative. The…

Descriptors: Diagnostic Tests, Concept Formation, Undergraduate Students, Engineering Education

The Effects of Small Sample Size on Identifying Polytomous DIF Using the Liu-Agresti Estimator of the Cumulative Common Odds Ratio

Peer reviewed

Direct link

Carvajal, Jorge; Skorupski, William P. – Educational and Psychological Measurement, 2010

This study is an evaluation of the behavior of the Liu-Agresti estimator of the cumulative common odds ratio when identifying differential item functioning (DIF) with polytomously scored test items using small samples. The Liu-Agresti estimator has been proposed by Penfield and Algina as a promising approach for the study of polytomous DIF but no…

Descriptors: Test Bias, Sample Size, Test Items, Computation

DIF Trees: Using Classification Trees to Detect Differential Item Functioning

Peer reviewed

Direct link

Vaughn, Brandon K.; Wang, Qiu – Educational and Psychological Measurement, 2010

A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

Descriptors: Test Bias, Classification, Nonparametric Statistics, Regression (Statistics)

Online Calibration via Variable Length Computerized Adaptive Testing

Peer reviewed

Direct link

Chang, Yuan-chin Ivan; Lu, Hung-Yi – Psychometrika, 2010

Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration…

Descriptors: Test Items, Educational Testing, Adaptive Testing, Measurement

The Development and Technical Adequacy of Seventh-Grade Reading Comprehension Measures in a Progress Monitoring Assessment System. Technical Report #1102

Download full text

Park, Bitnara Jasmine; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2011

This technical report describes the process of development and piloting of reading comprehension measures that are appropriate for seventh-grade students as part of an online progress screening and monitoring assessment system, http://easycbm.com. Each measure consists of an original fictional story of approximately 1,600 to 1,900 words with 20…

Descriptors: Reading Comprehension, Reading Tests, Grade 7, Test Construction

Estimating Standard Errors of Cut Scores for Item Rating and Mapmark Procedures: A Generalizability Theory Approach

Peer reviewed

Direct link

Yin, Ping; Sconing, James – Educational and Psychological Measurement, 2008

Standard-setting methods are widely used to determine cut scores on a test that examinees must meet for a certain performance standard. Because standard setting is a measurement procedure, it is important to evaluate variability of cut scores resulting from the standard-setting process. Generalizability theory is used in this study to estimate…

Descriptors: Generalizability Theory, Standard Setting, Cutting Scores, Test Items

A Note on Using Stratified Alpha to Estimate the Composite Reliability of a Test Composed of Interrelated Nonhomogeneous Items

Peer reviewed

Direct link

Rae, Gordon – Psychological Methods, 2007

The relationship between stratified alpha (alpha-sub(s)) and the reliability of a test composed of interrelated nonhomogeneous items is examined. It is mathematically demonstrated that when there is congeneric equivalence within the strata or subtests, the difference between the coefficients is a function of the variances of the loadings within…

Descriptors: Test Reliability, Test Items, Computation, Error of Measurement

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Applied Psychological…	9
Educational and Psychological…	6
Journal of Educational…	6
Behavioral Research and…	4
Journal of Educational and…	3
American Institutes for…	2
Assessment & Evaluation in…	2
Psychological Methods	2
Psychometrika	2
Applied Measurement in…	1
Educational Measurement:…	1
Educational Researcher	1
Educational Testing Service	1
IEEE Transactions on Education	1
International Journal of…	1
Journal of Educational…	1
Measurement:…	1
Multivariate Behavioral…	1
Online Submission	1
School Psychology Review	1
Teachers College Record	1
More ▼

Alonzo, Julie	4
Hambleton, Ronald K.	4
Tindal, Gerald	4
Zwick, Rebecca	4
Lee, Guemin	3
Bock, R. Darrell	2
Emons, Wilco H. M.	2
Hanson, Bradley A.	2
Li, Yuan H.	2
Lissitz, Robert W.	2
Park, Bitnara Jasmine	2
Solano-Flores, Guillermo	2
Abedi, Jamal	1
Ackerman, Terry A.	1
Allen, Sally	1
Altepeter, Tom	1
Ankenmann, Robert D.	1
Arce, Alvaro J.	1
Ban, Jae-Chun	1
Blankenship, Charles D.	1
Brennan, Robert L.	1
Bristow, M.	1
Burton, Richard F.	1
Camilli, Gregory	1
More ▼