Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 2 |
| Since 2007 (last 20 years) | 7 |
Descriptor
| Sampling | 9 |
| Test Items | 9 |
| Test Length | 9 |
| Item Response Theory | 5 |
| Difficulty Level | 4 |
| Sample Size | 4 |
| Computation | 3 |
| Error of Measurement | 3 |
| Simulation | 3 |
| Test Construction | 3 |
| Accuracy | 2 |
| More ▼ | |
Source
| ProQuest LLC | 3 |
| Applied Measurement in… | 1 |
| ETS Research Report Series | 1 |
| Educational and Psychological… | 1 |
| Journal of Educational and… | 1 |
| Journal of Experimental… | 1 |
Author
| Berk, Ronald A. | 1 |
| Cheng, Ying | 1 |
| Diao, Qi | 1 |
| Dorans, Neil J. | 1 |
| Forsyth, Robert A. | 1 |
| Guo, Hongwen | 1 |
| Hong, Maxwell | 1 |
| Kannan, Priya | 1 |
| Katz, Irvin R. | 1 |
| Lu, Ru | 1 |
| Md Desa, Zairul Nor Deana | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 6 |
| Journal Articles | 5 |
| Dissertations/Theses -… | 3 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
| National Longitudinal Study… | 1 |
What Works Clearinghouse Rating
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Patton, Jeffrey M.; Cheng, Ying; Hong, Maxwell; Diao, Qi – Journal of Educational and Behavioral Statistics, 2019
In psychological and survey research, the prevalence and serious consequences of careless responses from unmotivated participants are well known. In this study, we propose to iteratively detect careless responders and cleanse the data by removing their responses. The careless responders are detected using person-fit statistics. In two simulation…
Descriptors: Test Items, Response Style (Tests), Identification, Computation
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Wu, Yi-Fang – ProQuest LLC, 2015
Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…
Descriptors: Item Response Theory, Test Items, Accuracy, Computation
Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2014
An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…
Descriptors: Sampling, Test Items, Effect Size, Scaling
Md Desa, Zairul Nor Deana – ProQuest LLC, 2012
In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…
Descriptors: Item Response Theory, Computation, Reliability, Classification
Sunnassee, Devdass – ProQuest LLC, 2011
Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…
Descriptors: Test Length, Test Format, Sample Size, Simulation
Peer reviewedBerk, Ronald A. – Journal of Experimental Education, 1980
A sampling methodology is proposed for determining lengths of tests designed to assess the comprehension of written discourse. It is based on Bormuth's transformational analysis, within a domain-referenced framework. Guidelines are provided for computing sample size and selecting sentences to which the transformational rules can be applied.…
Descriptors: Reading Comprehension, Reading Tests, Sampling, Test Construction
Scheetz, James P.; Forsyth, Robert A. – 1977
Empirical evidence is presented related to the effects of using a stratified sampling of items in multiple matrix sampling on the accuracy of estimates of the population mean. Data were obtained from a sample of 600 high school students for a 36-item mathematics test and a 40-item vocabulary test, both subtests of the Iowa Tests of Educational…
Descriptors: Achievement Tests, Difficulty Level, Item Analysis, Item Sampling

Direct link
