Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 4 |
Descriptor
Test Construction | 7 |
Test Interpretation | 7 |
Test Validity | 4 |
Quality Control | 3 |
Scores | 3 |
Evidence | 2 |
Inferences | 2 |
Test Items | 2 |
Test Use | 2 |
Academic Achievement | 1 |
Accountability | 1 |
More ▼ |
Source
Applied Measurement in… | 7 |
Author
Canivez, Gary L. | 1 |
Downing, Steven M. | 1 |
Dunbar, Stephen B. | 1 |
Haladyna, Thomas M. | 1 |
Hambleton, Ronald K. | 1 |
Huff, Kristen | 1 |
Jacobson, Erik | 1 |
Linn, Robert L. | 1 |
Plake, Barbara S. | 1 |
Reshetar, Rosemary | 1 |
Rupp, André A. | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Evaluative | 6 |
Reports - Descriptive | 1 |
Speeches/Meeting Papers | 1 |
Education Level
High Schools | 1 |
Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Wechsler Intelligence Scale… | 1 |
Woodcock Johnson Tests of… | 1 |
What Works Clearinghouse Rating
Canivez, Gary L.; Youngstrom, Eric A. – Applied Measurement in Education, 2019
The Cattell-Horn-Carroll (CHC) taxonomy of cognitive abilities married John Horn and Raymond Cattell's Extended Gf-Gc theory with John Carroll's Three-Stratum Theory. While there are some similarities in arrangements or classifications of tasks (observed variables) within similar broad or narrow dimensions, other salient theoretical features and…
Descriptors: Taxonomy, Cognitive Ability, Intelligence, Cognitive Tests
Jacobson, Erik; Svetina, Dubravka – Applied Measurement in Education, 2019
Contingent argument-based approaches to validity require a unique argument for each use, in contrast to more prescriptive approaches that identify the common kinds of validity evidence researchers should consider for every use. In this article, we evaluate our use of an approach that is both prescriptive "and" argument-based to develop a…
Descriptors: Test Validity, Test Items, Test Construction, Test Interpretation
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Plake, Barbara S.; Huff, Kristen; Reshetar, Rosemary – Applied Measurement in Education, 2010
In many large-scale assessment programs, achievement level descriptors (ALDs) provide a critical role in communicating what scores on the assessment mean and in interpreting what examinees know and are able to do based on their test performance. Based on their test performance, examinees are often classified into performance categories. The…
Descriptors: Evidence, Test Construction, Measurement, Standard Setting

Downing, Steven M.; Haladyna, Thomas M. – Applied Measurement in Education, 1997
An ideal process is outlined for test item development and the study of item responses to ensure that tests are sound. Qualitative and quantitative methods are used to assess the item-level validity evidence for high-stakes examinations. A checklist for assessment is provided. (SLD)
Descriptors: High Stakes Tests, Item Response Theory, Qualitative Research, Quality Control

Linn, Robert L.; Hambleton, Ronald K. – Applied Measurement in Education, 1991
Four main approaches to customized testing are described, and their resulting scores' valid uses and interpretations are discussed. Customized testing can yield valid normative and curriculum-specific information, although cautious application is needed to avoid misleading inferences about student achievement. (SLD)
Descriptors: Academic Achievement, Accountability, Criterion Referenced Tests, Curriculum

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques