ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	9

Source

Applied Measurement in…

Publication Type

Journal Articles	39
Reports - Evaluative	39
Speeches/Meeting Papers	4
Reports - Research	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 5	1
Grade 8	1
High Schools	1
Higher Education	1
Middle Schools	1
Secondary Education	1

Audience

Location

United Kingdom	1
United States	1
Vermont	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 39 results Save | Export

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Measuring the Reliability of Diagnostic Mastery Classifications at Multiple Levels of Reporting

Peer reviewed

Direct link

Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019

As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…

Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation

Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Peer reviewed

Direct link

Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement

The Utility of Augmented Subscores in a Licensure Exam: An Evaluation of Methods Using Empirical Data

Peer reviewed

Direct link

Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010

Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…

Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods

Criterion-Focused Approach to Reducing Adverse Impact in College Admissions

Peer reviewed

Direct link

Sinha, Ruchi; Oswald, Frederick; Imus, Anna; Schmitt, Neal – Applied Measurement in Education, 2011

The current study examines how using a multidimensional battery of predictors (high-school grade point average (GPA), SAT/ACT, and biodata), and weighting the predictors based on the different values institutions place on various student performance dimensions (college GPA, organizational citizenship behaviors (OCBs), and behaviorally anchored…

Descriptors: Grade Point Average, Interrater Reliability, Rating Scales, College Admission

Innovations in Measuring Rater Accuracy in Standard Setting: Assessing "Fit" to Item Characteristic Curves

Peer reviewed

Direct link

Hurtz, Gregory M.; Jones, J. Patrick – Applied Measurement in Education, 2009

Standard setting methods such as the Angoff method rely on judgments of item characteristics; item response theory empirically estimates item characteristics and displays them in item characteristic curves (ICCs). This study evaluated several indexes of rater fit to ICCs as a method for judging rater accuracy in their estimates of expected item…

Descriptors: Standard Setting (Scoring), Item Response Theory, Reliability, Measurement

The Sampling Theory for the Intraclass Reliability Coefficient.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1990

Sampling theory for the intraclass reliability coefficient, a Spearman-Brown extrapolation of alpha to a single measurement for each examinee, is less recognized and less cited than that of coefficient alpha. Techniques for constructing confidence intervals and testing hypotheses for the intraclass coefficient are presented. (SLD)

Descriptors: Hypothesis Testing, Measurement Techniques, Reliability, Sampling

Prophesying the Reliability of Cognitively Complex Assessments.

Peer reviewed

Nichols, Paul; Kuehl, Barbara Jean – Applied Measurement in Education, 1999

An approach is presented that can predict internal consistency of cognitively complex assessments on two dimensions, those of adding tasks with similar or different solution strategies and adding test takers with different solution strategies. Data from the 1992 National Assessment of Educational Progress mathematics assessment are used to…

Descriptors: Cognitive Tests, Mathematics Tests, Prediction, Test Reliability

The Effects of Nonnormality and Number of Response Categories on Reliability.

Peer reviewed

Bandalos, Deborah L.; Enders, Craig K. – Applied Measurement in Education, 1996

Computer simulation indicated that reliability increased with the degree of similarity between underlying and observed distributions when the observed categorical distribution was deliberately constructed to match the shape of the underlying distribution of the trait being measured. Reliability also increased with correlation among variables and…

Descriptors: Computer Simulation, Correlation, Likert Scales, Reliability

The Reliability and Validity of Weighted Composite Scores

Peer reviewed

Direct link

Kane, Michael; Case, Susan M. – Applied Measurement in Education, 2004

The scores on 2 distinct tests (e.g., essay and objective) are often combined to create a composite score, which is used to make decisions. The validity of the observed composite can sometimes be evaluated relative to an external criterion. However, in cases where no criterion is available, the observed composite has generally been evaluated in…

Descriptors: Validity, Weighted Scores, Reliability, Student Evaluation

Recommendations for Preparing and Scoring Constructed-Response Items: What the Experts Say

Peer reviewed

Direct link

Hogan, Thomas P.; Murphy, Gavin – Applied Measurement in Education, 2007

We determined the recommendations for preparing and scoring constructed-response (CR) test items in 25 sources (textbooks and chapters) on educational and psychological measurement. The project was similar to Haladyna's (2004) analysis for multiple-choice items. We identified 12 recommendations for preparing CR items given by multiple sources,…

Descriptors: Test Items, Scoring, Test Construction, Educational Indicators

Estimating Reliability under a Generalizability Theory Model for Test Scores Composed of Testlets.

Peer reviewed

Lee, Guemin; Frisbie, David A. – Applied Measurement in Education, 1999

Studied the appropriateness and implications of using a generalizability theory approach to estimating the reliability of scores from tests composed of testlets. Analyses of data from two national standardization samples suggest that manipulating the number of passages is a more productive way to obtain efficient measurement than manipulating the…

Descriptors: Generalizability Theory, Models, National Surveys, Reliability

Approximating Scale Score Standard Error of Measurement from the Raw Score Standard Error.

Peer reviewed

Feldt, Leonard S.; Qualls, Audrey L. – Applied Measurement in Education, 1998

Two relatively simple methods for estimating the condition standard error of measurement (SEM) for nonlinearly derived score scales are proposed. Applications indicate that these two procedures produce fairly consistent estimates that tend to peak near the high end of the scale and reach a minimum in the middle of the raw score scale. (SLD)

Descriptors: Error of Measurement, Estimation (Mathematics), Raw Scores, Reliability

Quality Control in the Development and Use of Performance Assessments.

Peer reviewed

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991

Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)

Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques

Previous Page | Next Page »

Pages: 1 | 2 | 3

Reliability	16
Test Reliability	15
Test Construction	13
Test Items	11
Interrater Reliability	10
Scores	10
Educational Assessment	9
Performance Based Assessment	8
Scoring	8
Evaluation Methods	6
Item Response Theory	6
Multiple Choice Tests	6
Test Validity	6
Validity	6
Mathematics Tests	5
Measurement Techniques	5
Standard Setting (Scoring)	5
Test Use	5
Computer Assisted Testing	4
Correlation	4
Decision Making	4
Elementary Secondary Education	4
Licensing Examinations…	4
Statistical Distributions	4
Student Evaluation	4
More ▼

Feldt, Leonard S.	4
Bandalos, Deborah L.	2
Enders, Craig K.	2
Hambleton, Ronald K.	2
Kane, Michael	2
Linn, Robert L.	2
Qualls, Audrey L.	2
Wise, Steven L.	2
Bimpeh, Yaw	1
Boughton, Keith A.	1
Calfee, Robert	1
Carol Eckerly	1
Case, Susan M.	1
Clark, Amy K.	1
Downing, Steven M.	1
Dunbar, Stephen B.	1
Fitzpatrick, Anne R.	1
Frary, Robert B.	1
Frisbie, David A.	1
Gierl, Mark J.	1
Gotzmann, Andrea	1
Haberman, Shelby	1
Haladyna, Thomas M.	1
Harrison, Liz	1
More ▼