ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	9

Descriptor

Testing Programs	20
Item Response Theory	8
State Programs	6
Scores	5
Test Items	5
Psychometrics	4
Test Interpretation	4
Test Validity	4
Achievement Tests	3
Computation	3
Computer Assisted Testing	3
Computer Programs	3
Item Analysis	3
Academic Ability	2
Academic Achievement	2
Adaptive Testing	2
Correlation	2
Cutting Scores	2
Data Analysis	2
Difficulty Level	2
Elementary Secondary Education	2
Error of Measurement	2
Grade 11	2
Grade 5	2
Grade 8	2
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	15
Reports - Research	11
Reports - Evaluative	4

Education Level

Grade 5	2
Grade 8	2
Elementary Education	1
Elementary Secondary Education	1
Grade 10	1
Grade 11	1
Grade 3	1
Grade 4	1
Grade 6	1
Grade 7	1
Grade 9	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1
More ▼

Audience

Location

Florida	1
Hong Kong	1
Indiana	1
Kansas	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	1
SRA Primary Mental Abilities…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

Semisupervised Learning Method to Adjust Biased Item Difficulty Estimates Caused by Nonignorable Missingness in a Virtual Learning Environment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Xue, Kang; Huggins-Manley, Anne Corinne; Leite, Walter – Educational and Psychological Measurement, 2022

In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of…

Descriptors: Virtual Classrooms, Artificial Intelligence, Item Response Theory, Item Analysis

How Does Calibration Timing and Seasonality Affect Item Parameter Estimates?

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Educational and Psychological Measurement, 2016

Continuously administered examination programs, particularly credentialing programs that require graduation from educational programs, often experience seasonality where distributions of examine ability may differ over time. Such seasonality may affect the quality of important statistical processes, such as item response theory (IRT) item…

Descriptors: Test Items, Item Response Theory, Computation, Licensing Examinations (Professions)

Item Response Theory Models for Wording Effects in Mixed-Format Scales

Peer reviewed

Direct link

Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu – Educational and Psychological Measurement, 2015

Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…

Descriptors: Item Response Theory, Test Format, Language Usage, Test Items

The Long-Term Sustainability of Different Item Response Theory Scaling Methods

Peer reviewed

Direct link

Keller, Lisa A.; Keller, Robert R. – Educational and Psychological Measurement, 2011

This article investigates the accuracy of examinee classification into performance categories and the estimation of the theta parameter for several item response theory (IRT) scaling techniques when applied to six administrations of a test. Previous research has investigated only two administrations; however, many testing programs equate tests…

Descriptors: Item Response Theory, Scaling, Sustainability, Classification

Peer reviewed

Direct link

Wyse, Adam E. – Educational and Psychological Measurement, 2011

Standard setting is a method used to set cut scores on large-scale assessments. One of the most popular standard setting methods is the Bookmark method. In the Bookmark method, panelists are asked to envision a response probability (RP) criterion and move through a booklet of ordered items based on a RP criterion. This study investigates whether…

Descriptors: Testing Programs, Standard Setting (Scoring), Cutting Scores, Probability

A Comparison of Approaches for Improving the Reliability of Objective Level Scores

Peer reviewed

Direct link

Skorupski, William P.; Carvajal, Jorge – Educational and Psychological Measurement, 2010

This study is an evaluation of the psychometric issues associated with estimating objective level scores, often referred to as "subscores." The article begins by introducing the concepts of reliability and validity for subscores from statewide achievement tests. These issues are discussed with reference to popular scaling techniques, classical…

Descriptors: Testing Programs, Test Validity, Achievement Tests, Scores

A Generalizability Theory Approach to Standard Error Estimates for Bookmark Standard Settings

Peer reviewed

Direct link

Lee, Guemin; Lewis, Daniel M. – Educational and Psychological Measurement, 2008

The bookmark standard-setting procedure is an item response theory-based method that is widely implemented in state testing programs. This study estimates standard errors for cut scores resulting from bookmark standard settings under a generalizability theory model and investigates the effects of different universes of generalization and error…

Descriptors: Generalizability Theory, Testing Programs, Error of Measurement, Cutting Scores

Construct Equivalence across Grades in a Vertical Scale for a K-12 Large-Scale Reading Assessment

Peer reviewed

Direct link

Wang, Shudong; Jiao, Hong – Educational and Psychological Measurement, 2009

In practice, vertical scales have been continually used to measure students' achievement progress across several grade levels and have been considered very challenging psychometric procedures. Recently, such practices have been drawing many criticisms. The major criticisms focus on dimensionality and construct equivalence of the latent trait or…

Descriptors: Reading Comprehension, Elementary Secondary Education, Measures (Individuals), Psychometrics

Automated Simultaneous Assembly of Multistage Testlets for a High-Stakes Licensing Examination

Peer reviewed

Direct link

Breithaupt, Krista; Hare, Donovan R. – Educational and Psychological Measurement, 2007

Many challenges exist for high-stakes testing programs offering continuous computerized administration. The automated assembly of test questions to exactly meet content and other requirements, provide uniformity, and control item exposure can be modeled and solved by mixed-integer programming (MIP) methods. A case study of the computerized…

Descriptors: Testing Programs, Psychometrics, Certification, Accounting

GURU: A Computer Program for Analyzing Categorized Data

Peer reviewed

Riedel, James A.; Dodson, Janet D. – Educational and Psychological Measurement, 1977

GURU is a computer program developed to analyze data generated by open-ended question techniques such as ECHO or other semistructured data collection techniques in which data are categorized. The program provides extensive descriptive statistics and allows extensive flexibility in comparing data. (Author/JKS)

Descriptors: Computer Programs, Data Analysis, Essay Tests, Test Interpretation

Bootstrap and Traditional Standard Errors of the Point-Biserial.

Peer reviewed

Harris, Deborah J.; Kolen, Michael J. – Educational and Psychological Measurement, 1988

Three methods of estimating point-biserial correlation coefficient standard errors were compared: (1) assuming normality; (2) not assuming normality; and (3) bootstrapping. Although errors estimated assuming normality were biased, such estimates were less variable and easier to compute, suggesting that this might be the method of choice in some…

Descriptors: Error of Measurement, Estimation (Mathematics), Item Analysis, Statistical Analysis

TAILOR: A FORTRAN Procedure for Interactive Tailored Testing

Peer reviewed

Cudeck, Robert A.; And Others – Educational and Psychological Measurement, 1977

TAILOR, a FORTRAN computer program for tailored testing, is described. The procedure for a joint ordering of persons and items with no pretesting as the basis for the tailored test is given, and a brief discussion of the computer program is included. (Author/JKS)

Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Programs, Test Construction

TAILOR-APL: An Interactive Computer Program for Individual Tailored Testing

Peer reviewed

McCormick, Douglas J.; Cliff, Norman – Educational and Psychological Measurement, 1977

An interactive computer program for tailored testing, called TAILOR, is presented. The program runs on the APL system. A cumulative file for each examinee is established and tests are then tailored to each examinee; extensive pretesting is not necessary. (JKS)

Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Programs, Test Construction

Item Response Theory and Classical Test Theory: An Empirical Comparison of Their Item/Person Statistics.

Peer reviewed

Fan, Xitao – Educational and Psychological Measurement, 1998

This study empirically examined the behaviors of item and person statistics derived from item response theory and classical test theory, focusing on item and person statistics and using a large-scale statewide assessment. Findings show that the person and item statistics from the two measurement frameworks are quite comparable. (SLD)

Descriptors: Item Response Theory, State Programs, Statistical Analysis, Test Items

A FORTRAN IV Program for Scoring Written Simulations

Peer reviewed

Bligh, Thomas J.; Noe, Michael J. – Educational and Psychological Measurement, 1977

A computer program for scoring written simulation tests provides individual scores and basic item analysis data. The program is written in Fortran IV and can accomodate up to thirty-five hundred options and up to ten thousand examinees. (Author/JKS)

Descriptors: Computer Oriented Programs, Item Analysis, Medical Education, Problem Solving

Previous Page | Next Page »

Pages: 1 | 2

Pomplun, Mark	2
Wyse, Adam E.	2
Babcock, Ben	1
Bennet, Richard W.	1
Bligh, Thomas J.	1
Breithaupt, Krista	1
Capps, Lee	1
Carvajal, Jorge	1
Chen, Hui-Fang	1
Cliff, Norman	1
Cudeck, Robert A.	1
Dodson, Janet D.	1
Fan, Xitao	1
Ferrara, Steven	1
Hakstian, A. Ralph	1
Hare, Donovan R.	1
Harris, Deborah J.	1
Huggins-Manley, Anne Corinne	1
Jiao, Hong	1
Jin, Kuan-Yu	1
Keller, Lisa A.	1
Keller, Robert R.	1
Kolen, Michael J.	1
Lee, Guemin	1
More ▼