ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	6

Source

Applied Psychological…	2
Assessment & Evaluation in…	1
Educational Research and…	1
Educational and Psychological…	1
International Journal of…	1
Journal of Educational…	1
Psychometrika	1

Author

Eggen, Theo J. H. M.	2
Burton, Richard F.	1
Cosyn, Eric	1
Finch, Holmes	1
Kahraman, Nilufer	1
Liu, Chen-Wei	1
Matayoshi, Jeffrey	1
Song, Hao	1
Thompson, Tony	1
Uzun, Hasan	1
Verelst, Norman D.	1
Wainer, Howard	1
Wang, Wen-Chung	1
de la Torre, Jimmy	1
More ▼

Publication Type

Reports - Evaluative	9
Journal Articles	8

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Are We There Yet? Evaluating the Effectiveness of a Recurrent Neural Network-Based Stopping Algorithm for an Adaptive Assessment

Peer reviewed

Direct link

Matayoshi, Jeffrey; Cosyn, Eric; Uzun, Hasan – International Journal of Artificial Intelligence in Education, 2021

Many recent studies have looked at the viability of applying recurrent neural networks (RNNs) to educational data. In most cases, this is done by comparing their performance to existing models in the artificial intelligence in education (AIED) and educational data mining (EDM) fields. While there is increasing evidence that, in many situations,…

Descriptors: Artificial Intelligence, Data Analysis, Student Evaluation, Adaptive Testing

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

Computerized Classification Testing with the Rasch Model

Peer reviewed

Direct link

Eggen, Theo J. H. M. – Educational Research and Evaluation, 2011

If classification in a limited number of categories is the purpose of testing, computerized adaptive tests (CATs) with algorithms based on sequential statistical testing perform better than estimation-based CATs (e.g., Eggen & Straetmans, 2000). In these computerized classification tests (CCTs), the Sequential Probability Ratio Test (SPRT) (Wald,…

Descriptors: Test Length, Adaptive Testing, Classification, Item Analysis

Computerized Classification Testing under the Generalized Graded Unfolding Model

Peer reviewed

Direct link

Wang, Wen-Chung; Liu, Chen-Wei – Educational and Psychological Measurement, 2011

The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree-disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut…

Descriptors: Computer Assisted Testing, Adaptive Testing, Classification, Item Response Theory

Simultaneous Estimation of Overall and Domain Abilities: A Higher-Order IRT Model Approach

Peer reviewed

Direct link

de la Torre, Jimmy; Song, Hao – Applied Psychological Measurement, 2009

Assessments consisting of different domains (e.g., content areas, objectives) are typically multidimensional in nature but are commonly assumed to be unidimensional for estimation purposes. The different domains of these assessments are further treated as multi-unidimensional tests for the purpose of obtaining diagnostic information. However, when…

Descriptors: Ability, Tests, Item Response Theory, Data Analysis

Loss of Information in Estimating Item Parameters in Incomplete Designs

Peer reviewed

Direct link

Eggen, Theo J. H. M.; Verelst, Norman D. – Psychometrika, 2006

In this paper, the efficiency of conditional maximum likelihood (CML) and marginal maximum likelihood (MML) estimation of the item parameters of the Rasch model in incomplete designs is investigated. The use of the concept of F-information (Eggen, 2000) is generalized to incomplete testing designs. The scaled determinant of the F-information…

Descriptors: Test Length, Computation, Maximum Likelihood Statistics, Models

Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking

Peer reviewed

Direct link

Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004

The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…

Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items

The MIMIC Model as a Method for Detecting DIF: Comparison With Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio

Peer reviewed

Direct link

Finch, Holmes – Applied Psychological Measurement, 2005

This study compares the ability of the multiple indicators, multiple causes (MIMIC) confirmatory factor analysis model to correctly identify cases of differential item functioning (DIF) with more established methods. Although the MIMIC model might have application in identifying DIF for multiple grouping variables, there has been little…

Descriptors: Identification, Factor Analysis, Test Bias, Models

An Adaptive Algebra Test: A Testlet-Based, Hierarchically-Structured Test with Validity-Based Scoring. Technical Report No. 90-92.

Download full text

Wainer, Howard; And Others – 1990

The initial development of a testlet-based algebra test was previously reported (Wainer and Lewis, 1990). This account provides the details of this excursion into the use of hierarchical testlets and validity-based scoring. A pretest of two 15-item hierarchical testlets was carried out in which examinees' performance on a 4-item subset of each…

Descriptors: Adaptive Testing, Algebra, Comparative Analysis, Computer Assisted Testing

Models	9
Test Length	9
Item Response Theory	6
Adaptive Testing	4
Test Items	4
Comparative Analysis	3
Computation	3
Computer Assisted Testing	3
Data Analysis	3
Classification	2
Error of Measurement	2
Item Analysis	2
Monte Carlo Methods	2
Test Reliability	2
Test Validity	2
Ability	1
Accuracy	1
Algebra	1
Alignment (Education)	1
Artificial Intelligence	1
Bayesian Statistics	1
Correlation	1
Cutting Scores	1
Educational Assessment	1
Factor Analysis	1
More ▼