Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 15 |
Descriptor
Comparative Analysis | 25 |
Test Items | 25 |
Sampling | 21 |
Item Response Theory | 11 |
Foreign Countries | 8 |
Item Analysis | 8 |
Test Construction | 8 |
Statistical Analysis | 7 |
Difficulty Level | 6 |
Equated Scores | 6 |
Sample Size | 6 |
More ▼ |
Source
Author
Donovan, Jenny | 2 |
Douglass, James B. | 2 |
Lennon, Melissa | 2 |
Reckase, Mark D. | 2 |
Anwyll, Steve | 1 |
Chang, Hua-Hua | 1 |
Chen, Pei-Hua | 1 |
Fife, James H. | 1 |
Geisinger, Kurt F. | 1 |
Glanville, Matthew | 1 |
Haladyna, Tom | 1 |
More ▼ |
Publication Type
Education Level
Elementary Education | 5 |
Elementary Secondary Education | 4 |
Higher Education | 3 |
Grade 4 | 2 |
Grade 6 | 2 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Grade 11 | 1 |
Grade 12 | 1 |
Grade 8 | 1 |
High Schools | 1 |
More ▼ |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 3 |
Trends in International… | 2 |
International Association for… | 1 |
National Assessment of… | 1 |
Progress in International… | 1 |
SAT (College Admission Test) | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Maeda, Hotaka; Zhang, Bo – International Journal of Testing, 2017
The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…
Descriptors: Cheating, Test Items, Mathematics, Statistics
Fife, James H.; James, Kofi; Peters, Stephanie – ETS Research Report Series, 2020
The concept of variability is central to statistics. In this research report, we review mathematics education research on variability and, based on that review and on feedback from an expert panel, propose a learning progression (LP) for variability. The structure of the proposed LP consists of 5 levels of sophistication in understanding…
Descriptors: Mathematics Education, Statistics Education, Feedback (Response), Research Reports
Oshima, T. C.; Wright, Keith; White, Nick – International Journal of Testing, 2015
Raju, van der Linden, and Fleer (1995) introduced a framework for differential functioning of items and tests (DFIT) for unidimensional dichotomous models. Since then, DFIT has been shown to be a quite versatile framework as it can handle polytomous as well as multidimensional models both at the item and test levels. However, DFIT is still limited…
Descriptors: Test Bias, Item Response Theory, Test Items, Simulation
Wu, Yi-Fang – ProQuest LLC, 2015
Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…
Descriptors: Item Response Theory, Test Items, Accuracy, Computation
Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020
Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…
Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis
Chen, Pei-Hua; Chang, Hua-Hua; Wu, Haiyan – Educational and Psychological Measurement, 2012
Two sampling-and-classification-based procedures were developed for automated test assembly: the Cell Only and the Cell and Cube methods. A simulation study based on a 540-item bank was conducted to compare the performance of the procedures with the performance of a mixed-integer programming (MIP) method for assembling multiple parallel test…
Descriptors: Test Items, Selection, Test Construction, Item Response Theory
Qian, Jiahe; Jiang, Yanming; von Davier, Alina A. – ETS Research Report Series, 2013
Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal…
Descriptors: Item Response Theory, Test Items, Sampling, True Scores
Özyurt, Hacer; Özyurt, Özcan – Eurasian Journal of Educational Research, 2015
Problem Statement: Learning-teaching activities bring along the need to determine whether they achieve their goals. Thus, multiple choice tests addressing the same set of questions to all are frequently used. However, this traditional assessment and evaluation form contrasts with modern education, where individual learning characteristics are…
Descriptors: Probability, Adaptive Testing, Computer Assisted Testing, Item Response Theory
He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014
Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…
Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis
Sunnassee, Devdass – ProQuest LLC, 2011
Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…
Descriptors: Test Length, Test Format, Sample Size, Simulation
Jordan, Eoin – Language Testing in Asia, 2012
This article examines the issue of cognates in frequency-based vocabulary size testing. Data from a pilot study for a cognate-controlled English vocabulary size test was used to assess whether a group of Japanese university English learners (n = 60) were more successful at responding to cognate items than noncognate ones in three 1000 word…
Descriptors: English (Second Language), Second Language Learning, College Students, Foreign Countries
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009
A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…
Descriptors: Sampling, Sample Size, Accuracy, Test Items
Hopstock, Paul J.; Pelczar, Marisa P. – National Center for Education Statistics, 2011
This technical report and user's guide is designed to provide researchers with an overview of the design and implementation of the 2009 Program for International Student Assessment (PISA), as well as with information on how to access the PISA 2009 data. This information is meant to supplement that presented in Organization for Economic Cooperation…
Descriptors: Parent Materials, Academic Achievement, Measures (Individuals), Program Effectiveness
Van Onna, Marieke J. H. – Applied Psychological Measurement, 2004
Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…
Descriptors: Sampling, Item Response Theory, Scaling, Comparative Analysis

Taylor, Annette Kujawski – College Student Journal, 2005
This research examined 2 elements of multiple-choice test construction, balancing the key and optimal number of options. In Experiment 1 the 3 conditions included a balanced key, overrepresentation of a and b responses, and overrepresentation of c and d responses. The results showed that error-patterns were independent of the key, reflecting…
Descriptors: Comparative Analysis, Test Items, Multiple Choice Tests, Test Construction
Previous Page | Next Page »
Pages: 1 | 2