NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 639 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Tim Moses; YoungKoung Kim – Journal of Educational Measurement, 2025
This study considers the estimation of marginal reliability and conditional accuracy measures using a generalized recursion procedure with several IRT-based ability and score estimators. The estimators include MLE, TCC, and EAP abilities, and corresponding test scores obtained with different weightings of the item scores. We consider reliability…
Descriptors: Item Response Theory, Scoring, Reliability, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Kathryn R. Glodowski; Yusuke Hayashi – Journal of Applied Behavior Analysis, 2025
The testing effect is a well-established phenomenon in cognitive psychology that refers to enhanced long-term retention of information due to active recalling through testing. Following a cross-disciplinary translation of the testing effect into behavioral principles, we systematically replicated the previous findings in a behavior-analytic…
Descriptors: Testing, Replication (Evaluation), Tests, Test Length
James Riddlesperger – ACT Education Corp., 2025
ACT announced a series of enhancements designed to modernize the ACT test and offer students more choice and flexibility in demonstrating their readiness for life after high school. The enhancements provide students more flexibility by allowing them to choose whether to take the science assessment, thereby reducing the test length by up to…
Descriptors: College Entrance Examinations, Testing, Change, Test Length
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Abdulla Alzarouni; R. J. De Ayala – Practical Assessment, Research & Evaluation, 2025
The assessment of model fit in latent trait modeling is an integral part of correctly applying the model. Still the assessment of model fit has been less utilized for ideal point models such as the Generalized Graded Unfolding Models (GGUM). The current study assesses the performance of the relative fit indices "AIC" and "BIC,"…
Descriptors: Goodness of Fit, Models, Statistical Analysis, Sample Size
Peer reviewed Peer reviewed
Direct linkDirect link
Jing Huang; Yuxiao Zhang; Jason W. Morphew; Jayson M. Nissen; Ben Van Dusen; Hua Hua Chang – Journal of Educational Measurement, 2025
Online calibration estimates new item parameters alongside previously calibrated items, supporting efficient item replenishment. However, most existing online calibration procedures for Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) lack mechanisms to ensure content balance during live testing. This limitation can lead to uneven…
Descriptors: Adaptive Testing, Computer Assisted Testing, Cognitive Measurement, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Dubravka Svetina Valdivia; Shenghai Dai – Journal of Experimental Education, 2024
Applications of polytomous IRT models in applied fields (e.g., health, education, psychology) are abound. However, little is known about the impact of the number of categories and sample size requirements for precise parameter recovery. In a simulation study, we investigated the impact of the number of response categories and required sample size…
Descriptors: Item Response Theory, Sample Size, Models, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Jiang, Zhehan; Han, Yuting; Xu, Lingling; Shi, Dexin; Liu, Ren; Ouyang, Jinying; Cai, Fen – Educational and Psychological Measurement, 2023
The part of responses that is absent in the nonequivalent groups with anchor test (NEAT) design can be managed to a planned missing scenario. In the context of small sample sizes, we present a machine learning (ML)-based imputation technique called chaining random forests (CRF) to perform equating tasks within the NEAT design. Specifically, seven…
Descriptors: Test Items, Equated Scores, Sample Size, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Félix González-Carrasco; Felipe Espinosa Parra; Izaskun Álvarez-Aguado; Sebastián Ponce Olguín; Vanessa Vega Córdova; Miguel Roselló-Peñaloza – British Journal of Learning Disabilities, 2025
Background: The study focuses on the need to optimise assessment scales for support needs in individuals with intellectual and developmental disabilities. Current scales are often lengthy and redundant, leading to exhaustion and response burden. The goal is to use machine learning techniques, specifically item-reduction methods and selection…
Descriptors: Artificial Intelligence, Intellectual Disability, Developmental Disabilities, Individual Needs
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Hasibe Yahsi Sari; Hulya Kelecioglu – International Journal of Assessment Tools in Education, 2025
The aim of the study is to examine the effect of polytomous item ratio on ability estimation in different conditions in multistage tests (MST) using mixed tests. The study is simulation-based research. In the PISA 2018 application, the ability parameters of the individuals and the item pool were created by using the item parameters estimated from…
Descriptors: Test Items, Test Format, Accuracy, Test Length
Peer reviewed Peer reviewed
Direct linkDirect link
Kilmen, Sevilay – Journal of Psychoeducational Assessment, 2022
The present study has two main purposes. The first is to create a short form of the BTPS and to evaluate the psychometric properties of the short form. The second is to evaluate the performance of the ant colony optimization procedure and discuss the applicability of the ant colony optimization procedure in creating a short form. Results revealed…
Descriptors: Personality Measures, Test Length, Psychometrics, Undergraduate Students
Peer reviewed Peer reviewed
Direct linkDirect link
Chia-Ying Chu; Pei-Hua Chen; Yi-Shin Tsai; Chieh-An Chen; Yi-Chih Chan; Yan-Jhe Ciou – Journal of Deaf Studies and Deaf Education, 2024
This study investigated the impact of language sample length on mean length of utterance (MLU) and aimed to determine the minimum number of utterances required for a reliable MLU. Conversations were collected from Mandarin-speaking, hard-of-hearing and typical-hearing children aged 16-81 months. The MLUs were calculated using sample sizes ranging…
Descriptors: Foreign Countries, Mandarin Chinese, Young Children, Language Acquisition
Peer reviewed Peer reviewed
Direct linkDirect link
María Vicent; Andrea Fuster; María Pérez-Marco; María del Pilar Aparicio-Flores – Journal of Psychoeducational Assessment, 2025
Although the original long version of the Hewitt Multidimensional Perfectionism Scale (HMPS) has been translated and validated in a Spanish population, no study to date has examined the psychometric properties of a short version of the HMPS with a Spanish-speaking sample. For this reason, the aim of this study is to analyze the psychometric…
Descriptors: Personality Measures, Personality Traits, Spanish, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Jun-ichiro Yasuda; Michael M. Hull; Naohiro Mae; Kentaro Kojima – Physical Review Physics Education Research, 2025
Although conceptual assessment tests are commonly administered at the beginning and end of a semester, this pre-post approach has inherent limitations. Specifically, education researchers and instructors have limited ability to observe the progression of students' conceptual understanding throughout the course. Furthermore, instructors are limited…
Descriptors: Computer Assisted Testing, Adaptive Testing, Science Tests, Scientific Concepts
Peer reviewed Peer reviewed
Direct linkDirect link
He, Yinhong – Journal of Educational Measurement, 2023
Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…
Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Sun, Ting; Kim, Stella Yun – Measurement: Interdisciplinary Research and Perspectives, 2021
In many large testing programs, equipercentile equating has been widely used under a random groups design to adjust test difficulty between forms. However, one thorny issue occurs with equipercentile equating when a particular score has no observed frequency. The purpose of this study is to suggest and evaluate six potential methods in…
Descriptors: Equated Scores, Test Length, Sample Size, Methods
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  43