ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	36

Descriptor

Test Length	76
Item Response Theory	34
Test Items	20
Simulation	16
Computer Assisted Testing	15
Scores	15
Computation	14
Sample Size	14
Adaptive Testing	13
Test Construction	13
Test Reliability	13
Error of Measurement	12
Monte Carlo Methods	11
Test Validity	11
Comparative Analysis	10
Item Analysis	10
Maximum Likelihood Statistics	10
Classification	9
Correlation	9
Estimation (Mathematics)	9
Testing Problems	9
Models	8
Reliability	8
Bayesian Statistics	7
Mathematical Models	7
More ▼

Publication Type

Journal Articles	76
Reports - Evaluative	76
Reports - Research	2
Speeches/Meeting Papers	2
Collected Works - General	1
Information Analyses	1

Education Level

Elementary Secondary Education	1
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Practitioners

Location

Japan	1
Netherlands	1
New York	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

ACTFL Oral Proficiency…	1
Armed Forces Qualification…	1
Developmental Indicators for…	1
International English…	1
Peabody Picture Vocabulary…	1
Raven Advanced Progressive…	1
Test of English as a Foreign…	1
Wechsler Adult Intelligence…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 76 results Save | Export

Are We There Yet? Evaluating the Effectiveness of a Recurrent Neural Network-Based Stopping Algorithm for an Adaptive Assessment

Peer reviewed

Direct link

Matayoshi, Jeffrey; Cosyn, Eric; Uzun, Hasan – International Journal of Artificial Intelligence in Education, 2021

Many recent studies have looked at the viability of applying recurrent neural networks (RNNs) to educational data. In most cases, this is done by comparing their performance to existing models in the artificial intelligence in education (AIED) and educational data mining (EDM) fields. While there is increasing evidence that, in many situations,…

Descriptors: Artificial Intelligence, Data Analysis, Student Evaluation, Adaptive Testing

Test Review: Current Options in At-Home Language Proficiency Tests for Making High-Stakes Decisions

Peer reviewed

Direct link

Isbell, Daniel R.; Kremmel, Benjamin – Language Testing, 2020

Administration of high-stakes language proficiency tests has been disrupted in many parts of the world as a result of the 2019 novel coronavirus pandemic. Institutions that rely on test scores have been forced to adapt, and in many cases this means using scores from a different test, or a new online version of an existing test, that can be taken…

Descriptors: Language Tests, High Stakes Tests, Language Proficiency, Second Language Learning

Test Review: TestDaF

Peer reviewed

Direct link

Norris, John; Drackert, Anastasia – Language Testing, 2018

The Test of German as a Foreign Language (TestDaF) plays a critical role as a standardized test of German language proficiency. Developed and administered by the Society for Academic Study Preparation and Test Development (g.a.s.t.), TestDaF was launched in 2001 and has experienced persistent annual growth, with more than 44,000 test takers in…

Descriptors: German, Second Language Learning, Language Tests, Language Proficiency

Profile Analyses as Feedback by Evaluating the Balance in Exam Scores

Peer reviewed
PDF on ERIC

Download full text

Vaheoja, Monika; Verhelst, N. D.; Eggen, T.J.H.M. – European Journal of Science and Mathematics Education, 2019

In this article, the authors applied profile analysis to Maths exam data to demonstrate how different exam forms, differing in difficulty and length, can be reported and easily interpreted. The results were presented for different groups of participants and for different institutions in different Maths domains by evaluating the balance. Some…

Descriptors: Feedback (Response), Foreign Countries, Statistical Analysis, Scores

ACTFL Oral Proficiency Interview -- Computer (OPIc)

Peer reviewed

Direct link

Isbell, Dan; Winke, Paula – Language Testing, 2019

The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…

Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning

The Incremental Validity of a Short Form of the Ideational Behavior Scale and Usefulness of Distractor, Contraindicative, and Lie Scales

Peer reviewed

Direct link

Runco, Mark A.; Walczyk, Jeffrey John; Acar, Selcuk; Cowger, Ernest L.; Simundson, Melissa; Tripp, Sunny – Journal of Creative Behavior, 2014

This article describes an empirical refinement of the "Runco Ideational Behavior Scale" (RIBS). The RIBS seems to be associated with divergent thinking, and the potential for creative thinking, but it was possible that its validity could be improved. With this in mind, three new scales were developed and the unique benefit (or…

Descriptors: Behavior Rating Scales, Creative Thinking, Test Validity, Psychometrics

A Nonparametric Approach to Estimate Classification Accuracy and Consistency

Peer reviewed

Direct link

Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014

When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…

Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics

Test Review: C. Mardell & D. S. Goldenberg. "Speed Developmental Indicators for the Assessment of Learning-Fourth Edition" ("Speed DIAL-4")

Peer reviewed

Direct link

Doskey, Elena M.; Lagunas, Brenda; SooHoo, Michelle; Lomax, Amanda; Bullick, Stephanie – Journal of Psychoeducational Assessment, 2013

The Speed DIAL-4 was developed from the Developmental Indicators for the Assessment of Learning, Fourth Edition (DIAL-4), a screening designed to identify children between the ages of 2 years, 6 months through 5 years, 11 months "who are in need of intervention or diagnostic assessment in the following areas: motor, concepts, language,…

Descriptors: Screening Tests, Young Children, Test Length, Scoring

Using Logistic Approximations of Marginal Trace Lines to Develop Short Assessments

Peer reviewed

Direct link

Stucky, Brian D.; Thissen, David; Edelen, Maria Orlando – Applied Psychological Measurement, 2013

Test developers often need to create unidimensional scales from multidimensional data. For item analysis, "marginal trace lines" capture the relation with the general dimension while accounting for nuisance dimensions and may prove to be a useful technique for creating short-form tests. This article describes the computations needed to obtain…

Descriptors: Test Construction, Test Length, Item Analysis, Item Response Theory

Identification of Differential Item Functioning in Assessment Booklet Designs with Structurally Missing Data

Peer reviewed

Direct link

Goodman, Joshua T.; Willse, John T.; Allen, Nancy L.; Klaric, John S. – Educational and Psychological Measurement, 2011

The Mantel-Haenszel procedure is a popular technique for determining items that may exhibit differential item functioning (DIF). Numerous studies have focused on the strengths and weaknesses of this procedure, but few have focused the performance of the Mantel-Haenszel method when structurally missing data are present as a result of test booklet…

Descriptors: Test Bias, Identification, Tests, Test Length

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

A Comparison of Four Methods of IRT Subscoring

Peer reviewed

Direct link

de la Torre, Jimmy; Song, Hao; Hong, Yuan – Applied Psychological Measurement, 2011

Lack of sufficient reliability is the primary impediment for generating and reporting subtest scores. Several current methods of subscore estimation do so either by incorporating the correlational structure among the subtest abilities or by using the examinee's performance on the overall test. This article conducted a systematic comparison of four…

Descriptors: Item Response Theory, Scoring, Methods, Comparative Analysis

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

Computerized Classification Testing with the Rasch Model

Peer reviewed

Direct link

Eggen, Theo J. H. M. – Educational Research and Evaluation, 2011

If classification in a limited number of categories is the purpose of testing, computerized adaptive tests (CATs) with algorithms based on sequential statistical testing perform better than estimation-based CATs (e.g., Eggen & Straetmans, 2000). In these computerized classification tests (CCTs), the Sequential Probability Ratio Test (SPRT) (Wald,…

Descriptors: Test Length, Adaptive Testing, Classification, Item Analysis

Checking Dimensionality in Item Response Models with Principal Component Analysis on Standardized Residuals

Peer reviewed

Direct link

Chou, Yeh-Tai; Wang, Wen-Chung – Educational and Psychological Measurement, 2010

Dimensionality is an important assumption in item response theory (IRT). Principal component analysis on standardized residuals has been used to check dimensionality, especially under the family of Rasch models. It has been suggested that an eigenvalue greater than 1.5 for the first eigenvalue signifies a violation of unidimensionality when there…

Descriptors: Test Length, Sample Size, Correlation, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Applied Psychological…	21
Educational and Psychological…	12
Journal of Educational…	8
Applied Measurement in…	7
Language Testing	4
Educational Research and…	2
Psychological Assessment	2
Psychometrika	2
Academic Medicine	1
Assessment & Evaluation in…	1
Educational Measurement:…	1
European Journal of Science…	1
Evaluation in Education:…	1
Intelligence	1
International Journal of…	1
International Journal of…	1
Journal of Creative Behavior	1
Journal of Educational…	1
Journal of Psychoeducational…	1
Journal of Visual Impairment…	1
Machine-Mediated Learning	1
Perceptual and Motor Skills	1
Popular Measurement	1
Psychological Methods	1
Research in the Schools	1
More ▼

Wang, Wen-Chung	5
Meijer, Rob R.	3
Allen, Nancy L.	2
De Ayala, R. J.	2
Eggen, Theo J. H. M.	2
Finch, Holmes	2
Fitzpatrick, Anne R.	2
Hambleton, Ronald K.	2
Sijtsma, Klaas	2
Song, Hao	2
Wollack, James A.	2
de la Torre, Jimmy	2
Acar, Selcuk	1
Alsawalmeh, Yousef M.	1
Ankenman, Robert D.	1
Arthur, Winfred, Jr.	1
Axelrod, Bradley N.	1
Bergstrom, Betty	1
Bullick, Stephanie	1
Burton, Richard F.	1
Camilli, Gregory	1
Chang, Hua-Hua	1
Chen, Cheng-Te	1
Chen, Shu-Ying	1
More ▼