Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 5 |
Descriptor
Test Reliability | 11 |
Item Response Theory | 6 |
Test Validity | 6 |
Computer Assisted Testing | 5 |
Mathematics Tests | 4 |
State Programs | 4 |
Testing Programs | 4 |
Comparative Testing | 3 |
Equated Scores | 3 |
Item Analysis | 3 |
Scores | 3 |
More ▼ |
Source
Applied Measurement in… | 11 |
Author
Carlo, Maria S. | 1 |
Coffman, Don D. | 1 |
Edwards, Michael C. | 1 |
Flora, David B. | 1 |
Henly, George A. | 1 |
Holland, Paul W. | 1 |
Kiplinger, Vonda L. | 1 |
Klein, Stephen P. | 1 |
Lane, Suzanne | 1 |
Lee, Yoonsun | 1 |
Linn, Robert L. | 1 |
More ▼ |
Publication Type
Journal Articles | 11 |
Reports - Research | 6 |
Reports - Evaluative | 5 |
Education Level
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Grade 4 | 1 |
Grade 5 | 1 |
Grade 7 | 1 |
Grade 8 | 1 |
High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Location
Vermont | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Edwards, Michael C.; Flora, David B.; Thissen, David – Applied Measurement in Education, 2012
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Test Items
Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis

Holland, Paul W.; Wainer, Howard – Applied Measurement in Education, 1990
Two attempts to adjust state mean Scholastic Aptitude Test (SAT) scores for differential participation rates are examined. Both attempts are rejected, and five rules for performing adjustments are outlined to foster follow-up checks on untested assumptions. National Assessment of Educational Progress state data are determined to be more accurate.…
Descriptors: College Applicants, College Entrance Examinations, Estimation (Mathematics), Item Bias

Stone, Clement A.; Lane, Suzanne – Applied Measurement in Education, 1991
A model-testing approach for evaluating the stability of item response theory item parameter estimates (IPEs) in a pretest-posttest design is illustrated. Nineteen items from the Head Start Measures Battery were used. A moderately high degree of stability in the IPEs for 5,510 children assessed on 2 occasions was found. (TJH)
Descriptors: Comparative Testing, Compensatory Education, Computer Assisted Testing, Early Childhood Education

Vispoel, Walter P.; Coffman, Don D. – Applied Measurement in Education, 1994
Computerized-adaptive (CAT) and self-adapted (SAT) music listening tests were compared for efficiency, reliability, validity, and motivational benefits with 53 junior high school students. Results demonstrate trade-offs, with greater potential motivational benefits for SAT and greater efficiency for CAT. SAT elicited more favorable responses from…
Descriptors: Adaptive Testing, Computer Assisted Testing, Efficiency, Item Response Theory
Wise, Steven L. – Applied Measurement in Education, 2006
In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…
Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory

Klein, Stephen P.; And Others – Applied Measurement in Education, 1995
Portfolios are the centerpiece of Vermont's statewide assessment program in mathematics. Portfolio scores in the first two years were not reliable enough to permit the reporting of student-level results, but increasing the number of readers or the number of portfolio pieces is not operationally feasible. (SLD)
Descriptors: Educational Assessment, Elementary Secondary Education, Mathematics Tests, Performance Based Assessment

Linn, Robert L.; Kiplinger, Vonda L. – Applied Measurement in Education, 1995
The adequacy of linking statewide standardized test results to the National Assessment of Educational Progress by using equipercentile equating procedures was investigated using statewide mathematics data from four states. Results suggest that the linkings are not sufficiently trustworthy to make comparisons based on the tails of the distribution.…
Descriptors: Comparative Analysis, Educational Assessment, Equated Scores, Mathematics Tests

Royer, James M.; Carlo, Maria S. – Applied Measurement in Education, 1991
Measures of linguistic competence for limited-English-proficient students are discussed. The results for 134 students in grades 3 through 6 from a study of the reliability and validity of the Sentence Verification Technique tests as measures of listening and reading comprehension performance in native languages and English are reported. (TJH)
Descriptors: Bilingual Education, Comparative Testing, Elementary Education, Elementary School Students