ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	6
Since 2017 (last 10 years)	15
Since 2007 (last 20 years)	26

Descriptor

Difficulty Level	57
Test Length	57
Test Items	40
Item Response Theory	23
Sample Size	18
Test Reliability	14
Comparative Analysis	12
Item Analysis	12
Computer Assisted Testing	11
Correlation	11
Equated Scores	11
Test Construction	11
Simulation	10
Test Format	9
Adaptive Testing	8
Higher Education	8
Statistical Analysis	8
Monte Carlo Methods	7
Accuracy	6
Error of Measurement	6
Estimation (Mathematics)	6
Foreign Countries	6
Item Banks	6
Computation	5
Latent Trait Theory	5
More ▼

Publication Type

Reports - Research	43
Journal Articles	28
Speeches/Meeting Papers	12
Reports - Evaluative	8
Dissertations/Theses -…	5
Guides - Non-Classroom	2
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	2
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Researchers

Location

Armenia	1
Australia	1
China	1
Netherlands	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Comprehensive Tests of Basic…	1
New Jersey College Basic…	1
Otis Lennon School Ability…	1
SAT (College Admission Test)	1
Stanford Binet Intelligence…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 57 results Save | Export

Effect of Sample Length on MLU in Mandarin-Speaking Hard-of-Hearing Children

Peer reviewed

Direct link

Chia-Ying Chu; Pei-Hua Chen; Yi-Shin Tsai; Chieh-An Chen; Yi-Chih Chan; Yan-Jhe Ciou – Journal of Deaf Studies and Deaf Education, 2024

This study investigated the impact of language sample length on mean length of utterance (MLU) and aimed to determine the minimum number of utterances required for a reliable MLU. Conversations were collected from Mandarin-speaking, hard-of-hearing and typical-hearing children aged 16-81 months. The MLUs were calculated using sample sizes ranging…

Descriptors: Foreign Countries, Mandarin Chinese, Young Children, Language Acquisition

What Affects the Quality of Score Transformations? Potential Issues in True-Score Equating Using the Partial Credit Model

Peer reviewed

Direct link

Fellinghauer, Carolina; Debelak, Rudolf; Strobl, Carolin – Educational and Psychological Measurement, 2023

This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation…

Descriptors: True Scores, Equated Scores, Test Items, Sample Size

A Randomization P-Value Test for Detecting Copying on Multiple-Choice Exams

Peer reviewed

Direct link

Lang, Joseph B. – Journal of Educational and Behavioral Statistics, 2023

This article is concerned with the statistical detection of copying on multiple-choice exams. As an alternative to existing permutation- and model-based copy-detection approaches, a simple randomization p-value (RP) test is proposed. The RP test, which is based on an intuitive match-score statistic, makes no assumptions about the distribution of…

Descriptors: Identification, Cheating, Multiple Choice Tests, Item Response Theory

IRT Models for Learning with Item-Specific Learning Parameters

Peer reviewed

Direct link

Yu, Albert; Douglas, Jeffrey A. – Journal of Educational and Behavioral Statistics, 2023

We propose a new item response theory growth model with item-specific learning parameters, or ISLP, and two variations of this model. In the ISLP model, either items or blocks of items have their own learning parameters. This model may be used to improve the efficiency of learning in a formative assessment. We show ways that the ISLP model's…

Descriptors: Item Response Theory, Learning, Markov Processes, Monte Carlo Methods

Can Auxiliary Information Improve Rasch Estimation at Small Sample Sizes?

Direct link

Derek Sauder – ProQuest LLC, 2020

The Rasch model is commonly used to calibrate multiple choice items. However, the sample sizes needed to estimate the Rasch model can be difficult to attain (e.g., consider a small testing company trying to pretest new items). With small sample sizes, auxiliary information besides the item responses may improve estimation of the item parameters.…

Descriptors: Item Response Theory, Sample Size, Computation, Test Length

Closed Formula of Test Length Required for Adaptive Testing with Medium Probability of Solution

Peer reviewed

Direct link

Kárász, Judit T.; Széll, Krisztián; Takács, Szabolcs – Quality Assurance in Education: An International Perspective, 2023

Purpose: Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in…

Descriptors: Test Length, Probability, Comparative Analysis, Difficulty Level

Subscore Equating and Profile Reporting

Peer reviewed

Direct link

Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020

The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…

Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level

Item Response Theory, Computer Adaptive Testing and the Risk of Self-Deception

Download full text

Benton, Tom – Research Matters, 2021

Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…

Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level

A Special Case of Brennan's Index for Tests That Aim to Select a Limited Number of Students: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Arikan, Serkan; Aybek, Eren Can – Educational Measurement: Issues and Practice, 2022

Many scholars compared various item discrimination indices in real or simulated data. Item discrimination indices, such as item-total correlation, item-rest correlation, and IRT item discrimination parameter, provide information about individual differences among all participants. However, there are tests that aim to select a very limited number…

Descriptors: Monte Carlo Methods, Item Analysis, Correlation, Individual Differences

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices

Peer reviewed
PDF on ERIC

Download full text

Sunbul, Onder; Yormaz, Seha – International Journal of Evaluation and Research in Education, 2018

In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…

Descriptors: Difficulty Level, Cheating, Duplication, Test Length

Interaction of Proctoring and Student Major on Online Test Performance

Peer reviewed
PDF on ERIC

Download full text

Alessio, Helaine M.; Malay, Nancy; Maurer, Karsten; Bailer, A. John; Rubin, Beth – International Review of Research in Open and Distributed Learning, 2018

Traditional and online university courses share expectations for quality content and rigor. Student and faculty concerns about compromised academic integrity and actual instances of academic dishonesty in assessments, especially with online testing, are increasingly troublesome. Recent research suggests that in the absence of proctoring, the time…

Descriptors: Supervision, Majors (Students), Computer Assisted Testing, Scores

Difference in Generality between Item Pairs Mediates Effect of Difference in Items' Lengths on Inter-Item Correlation Size

Peer reviewed

Direct link

Hamby, Tyler – Journal of Psychoeducational Assessment, 2018

In this study, the author examined potential mediators of the negative relationship between the absolute difference in items' lengths and their inter-item correlation size. Fifty-two randomly ordered items from five personality scales were administered to 622 university students, and 46 respondents from a survey website rated the items'…

Descriptors: Correlation, Personality Traits, Undergraduate Students, Difficulty Level

Linking Composite Scores: Effects of Anchor Test Length and Content Representativeness. Research Report. ETS RR-16-36

Peer reviewed
PDF on ERIC

Download full text

Lin, Peng; Dorans, Neil; Weeks, Jonathan – ETS Research Report Series, 2016

The nonequivalent groups with anchor test (NEAT) design is frequently used in test score equating or linking. One important assumption of the NEAT design is that the anchor test is a miniversion of the 2 tests to be equated/linked. When the content of the 2 tests is different, it is not possible for the anchor test to be adequately representative…

Descriptors: Equated Scores, Test Length, Test Content, Difficulty Level

Assessing the Performance of Classical Test Theory Item Discrimination Estimators in Monte Carlo Simulations

Peer reviewed

Direct link

Bazaldua, Diego A. Luna; Lee, Young-Sun; Keller, Bryan; Fellers, Lauren – Asia Pacific Education Review, 2017

The performance of various classical test theory (CTT) item discrimination estimators has been compared in the literature using both empirical and simulated data, resulting in mixed results regarding the preference of some discrimination estimators over others. This study analyzes the performance of various item discrimination estimators in CTT:…

Descriptors: Test Items, Monte Carlo Methods, Item Response Theory, Correlation

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Educational and Psychological…	5
ProQuest LLC	5
ETS Research Report Series	3
Applied Measurement in…	2
Applied Psychological…	2
Journal of Educational and…	2
Asia Pacific Education Review	1
Assessment & Evaluation in…	1
Computers & Education	1
Educational Measurement:…	1
International Journal of…	1
International Review of…	1
Journal of Applied Measurement	1
Journal of Deaf Studies and…	1
Journal of Educational…	1
Journal of Experimental…	1
Journal of Psychoeducational…	1
Quality Assurance in…	1
Research Matters	1
Research in the Schools	1
More ▼

De Ayala, R. J.	2
Hambleton, Ronald K.	2
Wainer, Howard	2
Alessio, Helaine M.	1
Arikan, Serkan	1
Aybek, Eren Can	1
Bailer, A. John	1
Bashaw, W. L.	1
Bazaldua, Diego A. Luna	1
Benton, Tom	1
Bergstrom, Betty	1
Bergstrom, Betty A.	1
Beyerlein, Michael M.	1
Burton, Richard F.	1
Byars, Alvin Gregg	1
Catts, Ralph	1
Chang, S. Tai	1
Chia-Ying Chu	1
Chieh-An Chen	1
Clements, Andrea D.	1
Cliff, Norman	1
Coats, Pamela K.	1
Cook, Linda L.	1
DeMars, Christine E.	1
More ▼