ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	17

Descriptor

Methods	19
Test Items	7
Comparative Analysis	6
Equated Scores	4
Scores	4
Simulation	4
Accuracy	3
Adaptive Testing	3
Bias	3
Computer Assisted Testing	3
Data Collection	3
Statistical Analysis	3
Classification	2
College Entrance Examinations	2
Computation	2
Differences	2
Error of Measurement	2
Evaluation	2
Item Response Theory	2
Models	2
Selection	2
Test Bias	2
Achievement	1
Achievement Tests	1
Charts	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	17
Reports - Research	7
Reports - Evaluative	6
Reports - Descriptive	4

Education Level

Elementary Education	1
Grade 4	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
Iowa Tests of Basic Skills	1
Progress in International…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Argument-Based Approach to Validity: Developing a Living Document and Incorporating Preregistration

Peer reviewed

Direct link

Daria Gerasimova – Journal of Educational Measurement, 2024

I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document…

Descriptors: Validity, Documentation, Methods, Research Reports

Evaluation of a New Method for Providing Full Review Opportunities in Computerized Adaptive Testing--Computerized Adaptive Testing with Salt

Peer reviewed

Direct link

Cui, Zhongmin; Liu, Chunyan; He, Yong; Chen, Hanwei – Journal of Educational Measurement, 2018

Allowing item review in computerized adaptive testing (CAT) is getting more attention in the educational measurement field as more and more testing programs adopt CAT. The research literature has shown that allowing item review in an educational test could result in more accurate estimates of examinees' abilities. The practice of item review in…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Test Wiseness

Structural Zeros and Their Implications with Log-Linear Bivariate Presmoothing under the Internal-Anchor Design

Peer reviewed

Direct link

Kim, Hyung Jin; Brennan, Robert L.; Lee, Won-Chan – Journal of Educational Measurement, 2017

In equating, when common items are internal and scoring is conducted in terms of the number of correct items, some pairs of total scores ("X") and common-item scores ("V") can never be observed in a bivariate distribution of "X" and "V"; these pairs are called "structural zeros." This simulation…

Descriptors: Test Items, Equated Scores, Comparative Analysis, Methods

Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms

Peer reviewed

Direct link

Debeer, Dries; Ali, Usama S.; van Rijn, Peter W. – Journal of Educational Measurement, 2017

Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…

Descriptors: Test Format, Test Construction, Statistical Analysis, Comparative Analysis

Transforming SIBTEST to Account for Multilevel Data Structures

Peer reviewed

Direct link

French, Brian F.; Finch, W. Holmes – Journal of Educational Measurement, 2015

SIBTEST is a differential item functioning (DIF) detection method that is accurate and effective with small samples, in the presence of group mean differences, and for assessment of both uniform and nonuniform DIF. The presence of multilevel data with DIF detection has received increased attention. Ignoring such structure can inflate Type I error.…

Descriptors: Test Bias, Data, Simulation, Accuracy

Repeater Analysis for Combining Information from Different Assessments

Peer reviewed

Direct link

Haberman, Shelby; Yao, Lili – Journal of Educational Measurement, 2015

Admission decisions frequently rely on multiple assessments. As a consequence, it is important to explore rational approaches to combine the information from different educational tests. For example, U.S. graduate schools usually receive both TOEFL iBT® scores and GRE® General scores of foreign applicants for admission; however, little guidance…

Descriptors: College Entrance Examinations, Repetition, Methods, Error of Measurement

Optimal Bandwidth Selection in Observed-Score Kernel Equating

Peer reviewed

Direct link

Häggström, Jenny; Wiberg, Marie – Journal of Educational Measurement, 2014

The selection of bandwidth in kernel equating is important because it has a direct impact on the equated test scores. The aim of this article is to examine the use of double smoothing when selecting bandwidths in kernel equating and to compare double smoothing with the commonly used penalty method. This comparison was made using both an equivalent…

Descriptors: Equated Scores, Data Analysis, Comparative Analysis, Simulation

Evaluating Equating Accuracy and Assumptions for Groups that Differ in Performance

Peer reviewed

Direct link

Powers, Sonya; Kolen, Michael J. – Journal of Educational Measurement, 2014

Accurate equating results are essential when comparing examinee scores across exam forms. Previous research indicates that equating results may not be accurate when group differences are large. This study compared the equating results of frequency estimation, chained equipercentile, item response theory (IRT) true-score, and IRT observed-score…

Descriptors: Accuracy, Equated Scores, Differences, Groups

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores with Item Exposure Control and Content Constraints

Peer reviewed

Direct link

Yao, Lihua – Journal of Educational Measurement, 2014

The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle;…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Correcting Measurement Error in Latent Regression Covariates via the MC-SIMEX Method

Peer reviewed

Direct link

Rutkowski, Leslie; Zhou, Yan – Journal of Educational Measurement, 2015

Given the importance of large-scale assessments to educational policy conversations, it is critical that subpopulation achievement is estimated reliably and with sufficient precision. Despite this importance, biased subpopulation estimates have been found to occur when variables in the conditioning model side of a latent regression model contain…

Descriptors: Error of Measurement, Error Correction, Regression (Statistics), Computation

Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing

Peer reviewed

Direct link

Deng, Hui; Ansley, Timothy; Chang, Hua-Hua – Journal of Educational Measurement, 2010

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with…

Descriptors: Computer Assisted Testing, Adaptive Testing, Selection, Methods

Statistical Process Control Charts for Measuring and Monitoring Temporal Consistency of Ratings

Peer reviewed

Direct link

Omar, M. Hafidz – Journal of Educational Measurement, 2010

Methods of statistical process control were briefly investigated in the field of educational measurement as early as 1999. However, only the use of a cumulative sum chart was explored. In this article other methods of statistical quality control are introduced and explored. In particular, methods in the form of Shewhart mean and standard deviation…

Descriptors: Charts, Quality Control, Measurement, Test Items

Evaluation of Two New Smoothing Methods in Equating: The Cubic B-Spline Presmoothing Method and the Direct Presmoothing Method

Peer reviewed

Direct link

Cui, Zhongmin; Kolen, Michael J. – Journal of Educational Measurement, 2009

This article considers two new smoothing methods in equipercentile equating, the cubic B-spline presmoothing method and the direct presmoothing method. Using a simulation study, these two methods are compared with established methods, the beta-4 method, the polynomial loglinear method, and the cubic spline postsmoothing method, under three sample…

Descriptors: Equated Scores, Methods, Sample Size, Test Content

Improving Person-Fit Assessment by Correcting the Ability Estimate and Its Reference Distribution

Peer reviewed

Direct link

de La Torre, Jimmy; Deng, Weiling – Journal of Educational Measurement, 2008

The standardized log-likelihood of a response vector (l[subscript z]) is a popular IRT-based person-fit test statistic for identifying model-misfitting response patterns. Traditional use of l[subscript z] is overly conservative in detecting aberrance due to its incorrect assumption regarding its theoretical null distribution. This study proposes a…

Descriptors: Goodness of Fit, Measures (Individuals), Test Reliability, Responses

When Adaptation Is Not an Option: An Application of Multilingual Standard Setting

Peer reviewed

Direct link

Davis, Susan L.; Buckendahl, Chad W.; Plake, Barbara S. – Journal of Educational Measurement, 2008

As an alternative to adaptation, tests may also be developed simultaneously in multiple languages. Although the items on such tests could vary substantially, scores from these tests may be used to make the same types of decisions about different groups of examinees. The ability to make such decisions is contingent upon setting performance…

Descriptors: Test Results, Testing Programs, Multilingualism, Standard Setting

Previous Page | Next Page »

Pages: 1 | 2

Cui, Zhongmin	2
Kolen, Michael J.	2
Ali, Usama S.	1
Ansley, Timothy	1
Brennan, Robert L.	1
Buckendahl, Chad W.	1
Camilli, Gregory	1
Chang, Hua-Hua	1
Chen, Hanwei	1
Daria Gerasimova	1
Davis, Susan L.	1
Debeer, Dries	1
Deng, Hui	1
Deng, Weiling	1
Finch, W. Holmes	1
French, Brian F.	1
Gierl, Mark J.	1
Haberman, Shelby	1
He, Yong	1
Häggström, Jenny	1
Kim, Hyung Jin	1
Lee, Won-Chan	1
Liu, Chunyan	1
Marco, Gary L.	1
More ▼