Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 270 |
Since 2006 (last 20 years) | 541 |
Descriptor
Difficulty Level | 598 |
Statistical Analysis | 598 |
Foreign Countries | 287 |
Cognitive Processes | 141 |
Test Items | 141 |
Comparative Analysis | 116 |
Questionnaires | 115 |
Correlation | 107 |
College Students | 104 |
English (Second Language) | 94 |
Teaching Methods | 92 |
More ▼ |
Source
Author
Paas, Fred | 6 |
Costley, Jamie | 4 |
Kalyuga, Slava | 4 |
Rahimpour, Massoud | 4 |
Sarabi, M. K. | 4 |
Sheehan, Kathleen M. | 4 |
Elen, Jan | 3 |
Gafoor, K. Abdul | 3 |
Gilabert, Roger | 3 |
Lange, Christopher | 3 |
Liu, Tzu-Chien | 3 |
More ▼ |
Publication Type
Education Level
Location
Taiwan | 27 |
Germany | 21 |
Iran | 18 |
Australia | 17 |
Turkey | 17 |
China | 16 |
Canada | 12 |
South Africa | 10 |
California | 9 |
Belgium | 8 |
India | 8 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Does not meet standards | 1 |
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Tang, Xiaodan; Karabatsos, George; Chen, Haiqin – Applied Measurement in Education, 2020
In applications of item response theory (IRT) models, it is known that empirical violations of the local independence (LI) assumption can significantly bias parameter estimates. To address this issue, we propose a threshold-autoregressive item response theory (TAR-IRT) model that additionally accounts for order dependence among the item responses…
Descriptors: Item Response Theory, Test Items, Models, Computation
Akin-Arikan, Çigdem; Gelbal, Selahattin – Eurasian Journal of Educational Research, 2021
Purpose: This study aims to compare the performances of Item Response Theory (IRT) equating and kernel equating (KE) methods based on equating errors (RMSD) and standard error of equating (SEE) using the anchor item nonequivalent groups design. Method: Within this scope, a set of conditions, including ability distribution, type of anchor items…
Descriptors: Equated Scores, Item Response Theory, Test Items, Statistical Analysis
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
Pearson product-moment correlation coefficient between item g and test score X, known as item-test or item-total correlation ("Rit"), and item-rest correlation ("Rir") are two of the most used classical estimators for item discrimination power (IDP). Both "Rit" and "Rir" underestimate IDP caused by the…
Descriptors: Correlation, Test Items, Scores, Difficulty Level
Lozano, José H.; Revuelta, Javier – Applied Measurement in Education, 2021
The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework…
Descriptors: Bayesian Statistics, Computation, Learning, Testing
Traci Kutaka; Pavel Chernyavskiy; Carson Keeter; Julie Sarama; Douglas Clements – Society for Research on Educational Effectiveness, 2021
Background: Data on children's ability to answer assessment questions correctly paints an incomplete portrait of what they know and can do mathematically; yet, it remains a common basis for program evaluation. Indeed, pre-post-assessment correctness is necessary but insufficient evidence for making inferences about learning and program…
Descriptors: Kindergarten, Learning Trajectories, Learning Strategies, Thinking Skills
Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020
The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…
Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level
Luke G. Eglington; Philip I. Pavlik – Grantee Submission, 2020
Decades of research has shown that spacing practice trials over time can improve later memory, but there are few concrete recommendations concerning how to optimally space practice. We show that existing recommendations are inherently suboptimal due to their insensitivity to time costs and individual- and item-level differences. We introduce an…
Descriptors: Scheduling, Drills (Practice), Memory, Testing
Luke G. Eglington; Philip I. Pavlik Jr. – npj Science of Learning, 2020
Decades of research has shown that spacing practice trials over time can improve later memory, but there are few concrete recommendations concerning how to optimally space practice. We show that existing recommendations are inherently suboptimal due to their insensitivity to time costs and individual- and item-level differences. We introduce an…
Descriptors: Scheduling, Drills (Practice), Memory, Testing
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Tullis, Jonathan G.; Fiechter, Joshua L.; Benjamin, Aaron S. – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2018
Practice tests provide large mnemonic benefits over restudying, but learners judge practice tests as less effective than restudying. Consequently, learners infrequently utilize testing when controlling their study and often choose to be tested only on well-learned items. In 5 experiments, we examined whether learners' choices about testing and…
Descriptors: Testing, Review (Reexamination), Selection, Memory
Lenhard, Wolfgang; Lenhard, Alexandra – Educational and Psychological Measurement, 2021
The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test scales with different item numbers and difficulties via an item response theory approach. Subsequently, we modeled the norm scores based on random…
Descriptors: Test Norms, Scores, Regression (Statistics), Test Items
Sunbul, Onder; Yormaz, Seha – International Journal of Evaluation and Research in Education, 2018
In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…
Descriptors: Difficulty Level, Cheating, Duplication, Test Length
Sunbul, Onder; Yormaz, Seha – Eurasian Journal of Educational Research, 2018
Purpose: Several studies can be found in the literature that investigate the performance of ? under various conditions. However no study for the effects of item difficulty, item discrimination, and ability restrictions on the performance of ? could be found. The current study aims to investigate the performance of ? for the conditions given below.…
Descriptors: Test Items, Difficulty Level, Ability, Cheating
Macnamara, Brooke N.; Frank, David J. – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2018
For well over a century, scientists have investigated individual differences in performance. The majority of studies have focused on either differences in practice, or differences in cognitive resources. However, the predictive ability of either practice or cognitive resources varies considerably across tasks. We are the first to examine task…
Descriptors: Learning, Performance, Cognitive Processes, Difficulty Level