Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 5 |
Descriptor
Correlation | 8 |
Test Items | 8 |
Sampling | 7 |
Item Response Theory | 4 |
Difficulty Level | 3 |
Item Analysis | 3 |
Test Construction | 3 |
Accuracy | 2 |
Adolescents | 2 |
Equated Scores | 2 |
Error of Measurement | 2 |
More ▼ |
Source
British Journal of… | 1 |
ETS Research Report Series | 1 |
Educational and Psychological… | 1 |
OECD Publishing (NJ1) | 1 |
Research Matters | 1 |
Research Papers in Education | 1 |
Author
Anwyll, Steve | 1 |
Boyd, Thomas A. | 1 |
Bramley, Tom | 1 |
Dorans, Neil J. | 1 |
Glanville, Matthew | 1 |
He, Qingping | 1 |
Livingston, Samuel A. | 1 |
McLarty, Joyce R. | 1 |
Opposs, Dennis | 1 |
Shreya Bhandari | 1 |
Sijtsma, Klaas | 1 |
More ▼ |
Publication Type
Reports - Research | 7 |
Journal Articles | 5 |
Speeches/Meeting Papers | 2 |
Numerical/Quantitative Data | 1 |
Reports - Descriptive | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Location
United Kingdom (England) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
Wechsler Intelligence Scale… | 1 |
Wechsler Intelligence Scales… | 1 |
What Works Clearinghouse Rating
Yunting Liu; Shreya Bhandari; Zachary A. Pardos – British Journal of Educational Technology, 2025
Effective educational measurement relies heavily on the curation of well-designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT-3.5, GPT-4,…
Descriptors: Artificial Intelligence, Test Items, Psychometrics, Educational Assessment
Bramley, Tom – Research Matters, 2020
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…
Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy
Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2014
An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…
Descriptors: Sampling, Test Items, Effect Size, Scaling
He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014
Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…
Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis
OECD Publishing (NJ1), 2009
The Organisation for Economic Cooperation and Development's (OECD's) Programme for International Student Assessment (PISA) surveys, which take place every three years, have been designed to collect information about 15-year-old students in participating countries. PISA examines how well students are prepared to meet the challenges of the future,…
Descriptors: Policy Formation, Scaling, Academic Achievement, Interrater Reliability
Livingston, Samuel A.; Dorans, Neil J. – ETS Research Report Series, 2004
This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
Descriptors: Item Analysis, Computation, Difficulty Level, Test Items
McLarty, Joyce R. – 1979
A new approach to item analysis, concerned not only with differences among individuals, but also differences among the groups in which the individuals may have membership, is described. This multi-level aspect may be applied to studies of differences among individuals, classes, and schools, depending upon the kind of problem being studied and the…
Descriptors: Analysis of Variance, Biology, Correlation, Group Membership
Boyd, Thomas A.; Tramontana, Michael G. – 1984
To examine the validity of short forms of the Wechsler Intelligence Scale for Children-Revised (WISC-R), the WISC-R was first administered to 106 hospitalized psychiatric patients, aged 8-16. No subjects had a primary diagnosis of mental retardation or learning disability, and one-third were receiving psychotropic medication. WISC-R IQ scores…
Descriptors: Adolescents, Children, Correlation, Elementary Secondary Education