Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 6 |
Descriptor
Source
Applied Measurement in… | 13 |
Author
Ben-Simon, Anat | 1 |
Carlo, Maria S. | 1 |
Coffman, Don D. | 1 |
Cohen, Yoav | 1 |
Geisinger, Kurt F. | 1 |
Henly, George A. | 1 |
Lee, Yoonsun | 1 |
Levi, Effi | 1 |
Moore, William P. | 1 |
Phillips, Gary W. | 1 |
Phillips, S. E. | 1 |
More ▼ |
Publication Type
Journal Articles | 13 |
Reports - Evaluative | 6 |
Reports - Research | 6 |
Information Analyses | 1 |
Reports - Descriptive | 1 |
Education Level
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Grade 4 | 1 |
Grade 5 | 1 |
Grade 7 | 1 |
Grade 8 | 1 |
High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Iowa Tests of Basic Skills | 1 |
What Works Clearinghouse Rating
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis

Schmitz, Constance C.; delMas, Robert C. – Applied Measurement in Education, 1990
Using S. J. Messick's theoretical work concerning construct validity as a guide, underlying hypotheses for investigation when validating placement test decisions are assessed. Guidelines on validating placement decisions are offered, and the hypotheses and guidelines are applied in a validation study of the Written English Expression Placement…
Descriptors: College Freshmen, Construct Validity, Guidelines, Higher Education

Vispoel, Walter P.; Coffman, Don D. – Applied Measurement in Education, 1994
Computerized-adaptive (CAT) and self-adapted (SAT) music listening tests were compared for efficiency, reliability, validity, and motivational benefits with 53 junior high school students. Results demonstrate trade-offs, with greater potential motivational benefits for SAT and greater efficiency for CAT. SAT elicited more favorable responses from…
Descriptors: Adaptive Testing, Computer Assisted Testing, Efficiency, Item Response Theory

Phillips, S. E. – Applied Measurement in Education, 1994
This article explores the measurement problems associated with granting accommodations for mental disabilities, uses existing case law to construct a legal framework for considering such accommodations, and discusses the advantages and disadvantages of alternative strategies for handling testing accommodation requests. (Author/SLD)
Descriptors: Accessibility (for Disabled), Alternative Assessment, Court Litigation, Elementary Secondary Education

Pomplun, Mark – Applied Measurement in Education, 1997
A method to investigate consequential evidence of validity for a state assessment developed to change teacher instructional practices is presented. Survey responses from over 1,000 Kansas teachers were used to construct a path model that allowed effects of the state assessment to be studied at building and teacher levels. (SLD)
Descriptors: Educational Assessment, Educational Change, Instructional Effectiveness, Path Analysis
Wise, Steven L. – Applied Measurement in Education, 2006
In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…
Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory

Geisinger, Kurt F. – Applied Measurement in Education, 1994
Federal law requires that individuals with handicapping conditions be administered assessments in ways that accommodate their disabilities without penalizing them. Validation studies are needed to evaluate the meaning of scores resulting from nonstandard test administrations. The limited number of these studies to date is reviewed. (SLD)
Descriptors: Disabilities, Educational Assessment, Elementary School Students, Elementary Secondary Education

Royer, James M.; Carlo, Maria S. – Applied Measurement in Education, 1991
Measures of linguistic competence for limited-English-proficient students are discussed. The results for 134 students in grades 3 through 6 from a study of the reliability and validity of the Sentence Verification Technique tests as measures of listening and reading comprehension performance in native languages and English are reported. (TJH)
Descriptors: Bilingual Education, Comparative Testing, Elementary Education, Elementary School Students

Moore, William P. – Applied Measurement in Education, 1994
Teacher testing-related attitudes and practices related to court-ordered achievement testing were studied through a mail survey completed by 79 elementary school teachers in a midwestern urban district. Teachers engaged in a large number of test preparation practices and reported finding minimal value in purpose or results of testing. (SLD)
Descriptors: Achievement Tests, Court Litigation, Educational Assessment, Educational Practices