Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 6 |
Descriptor
Error of Measurement | 17 |
Statistical Analysis | 17 |
Test Construction | 17 |
Test Items | 8 |
Item Analysis | 5 |
Test Reliability | 5 |
Comparative Analysis | 4 |
Mathematical Models | 4 |
Reliability | 4 |
Sample Size | 4 |
Scores | 4 |
More ▼ |
Source
Author
Alonzo, Julie | 1 |
Beglar, David | 1 |
Benson, Jeri | 1 |
Brennan, Robert L. | 1 |
CLEARY, T.A. | 1 |
Cimpian, Joseph R. | 1 |
Clark, R. Malcolm | 1 |
Cook, Linda | 1 |
Feigenbaum, Miriam | 1 |
Fink, Arlene | 1 |
Frary, Robert B. | 1 |
More ▼ |
Publication Type
Reports - Research | 10 |
Journal Articles | 6 |
Speeches/Meeting Papers | 4 |
Reports - Evaluative | 2 |
Books | 1 |
Guides - Non-Classroom | 1 |
Numerical/Quantitative Data | 1 |
Education Level
Elementary Education | 2 |
Higher Education | 2 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Early Childhood Education | 1 |
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Grade 2 | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 8 | 1 |
More ▼ |
Audience
Researchers | 1 |
Students | 1 |
Location
Japan | 1 |
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 1 |
Test of English for… | 1 |
What Works Clearinghouse Rating
Li, Feifei – ETS Research Report Series, 2017
An information-correction method for testlet-based tests is introduced. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By…
Descriptors: Item Response Theory, Generalizability Theory, Tests, Error of Measurement
Cimpian, Joseph R. – Educational Researcher, 2017
Quantitative research on sexual minority youths (SMYs) has likely contributed to misperceptions about the risk and deviance of this population. In part because it often relies on self-reported data from population-based self-administered questionnaires, this research is susceptible to misclassification bias whereby youths who are not SMYs are…
Descriptors: Secondary School Students, Adolescents, Minority Group Students, Homosexuality
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation
Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…
Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores
McLean, Stuart; Kramer, Brandon; Beglar, David – Language Teaching Research, 2015
An important gap in the field of second language vocabulary assessment concerns the lack of validated tests measuring aural vocabulary knowledge. The primary purpose of this study is to introduce and provide preliminary validity evidence for the Listening Vocabulary Levels Test (LVLT), which has been designed as a diagnostic tool to measure…
Descriptors: Test Construction, Test Validity, English (Second Language), Second Language Learning
Alonzo, Julie; Liu, Kimy; Tindal, Gerald – Behavioral Research and Teaching, 2008
This technical report describes the development of reading comprehension assessments designed for use as progress monitoring measures appropriate for 2nd Grade students. The creation, piloting, and technical adequacy of the measures are presented. The following are appended: (1) Item Specifications for MC [Multiple Choice] Comprehension - Passage…
Descriptors: Reading Comprehension, Reading Tests, Grade 2, Elementary School Students

White, Richard T.; Clark, R. Malcolm – Psychometrika, 1973
A test which allows for errors of measurement is derived for the hypothesis that all the members of a population who possess a certain skill are a sub-set of the members who possess another skill. (Author)
Descriptors: Error of Measurement, Mathematical Applications, Psychometrics, Statistical Analysis
Frary, Robert B.; Zimmerman, Donald W. – Educ Psychol Meas, 1970
Descriptors: Error of Measurement, Guessing (Tests), Multiple Choice Tests, Probability
CLEARY, T.A.; LINN, ROBERT L. – 1967
THE PURPOSE OF THIS RESEARCH WAS TO STUDY THE EFFECT OF ERROR OF MEASUREMENT UPON THE POWER OF STATISTICAL TESTS. ATTENTION WAS FOCUSED ON THE F-TEST OF THE SINGLE FACTOR ANALYSIS OF VARIANCE. FORMULAS WERE DERIVED TO SHOW THE RELATIONSHIP BETWEEN THE NONCENTRALITY PARAMETERS FOR ANALYSES USING TRUE SCORES AND THOSE USING OBSERVED SCORES. THE…
Descriptors: Analysis of Variance, Error of Measurement, Measurement Techniques, Psychological Testing

Yen, Wendy M. – Educational Measurement: Issues and Practice, 1997
The accuracy of statistics based on performance assessments that represent percentages of students reaching standards is explored using data from a large-scale performance assessment, the Maryland School Performance Assessment Program. Results with students in grades 3, 5, and 8 support the accuracy of pooling results to produce the statistics.…
Descriptors: Achievement Tests, Elementary Education, Error of Measurement, Performance Based Assessment
Sullins, Walter L. – 1971
Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
Descriptors: Comparative Analysis, Correlation, Error of Measurement, Item Analysis
Liu, Jinghua; Feigenbaum, Miriam; Cook, Linda – College Entrance Examination Board, 2004
This study explored possible configurations of the new SAT® critical reading section without analogy items. The item pool contained items from SAT verbal (SAT-V) sections of 14 previously administered SAT tests, calibrated using the three-parameter logistic IRT model. Multiple versions of several prototypes that do not contain analogy items were…
Descriptors: College Entrance Examinations, Critical Reading, Logical Thinking, Difficulty Level
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.
Benson, Jeri; Wilson, Michael – 1979
Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Descriptors: Comparative Analysis, Difficulty Level, Efficiency, Error of Measurement

Brennan, Robert L. – 1979
Using the basic principles of generalizability theory, a psychometric model for domain-referenced interpretations is proposed, discussed, and illustrated. This approach, assuming an analysis of variance or linear model, is applicable to numerous data collection designs, including the traditional persons-crossed-with-items design, which is treated…
Descriptors: Analysis of Variance, Cost Effectiveness, Criterion Referenced Tests, Cutting Scores
Gustafsson, Jan-Eric – 1977
The Rasch model for test analysis is described and compared with two-parameter and three-parameter latent-trait models. Conditional maximum likelihood equations for estimating item parameters are derived, and estimates of person parameters are described together with their confidence intervals. Goodness of fit tests are discussed, including a…
Descriptors: Adaptive Testing, Computer Programs, Equated Scores, Error of Measurement
Previous Page | Next Page »
Pages: 1 | 2