ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	7

Source

Applied Measurement in…

Publication Type

Journal Articles	16
Reports - Research	8
Reports - Evaluative	7
Collected Works - General	1
Reports - Descriptive	1

Education Level

Higher Education	2
Postsecondary Education	2
Secondary Education	1

Audience

Location

Israel	1
Vermont	1
Virginia	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Comparing Examinee-Based and Response-Based Motivation Filtering Methods in Remote Low-Stakes Testing

Peer reviewed

Direct link

Sarah Alahmadi; Christine E. DeMars – Applied Measurement in Education, 2024

Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote testing conditions. To remove the effects of low effort levels in responses observed in remote low-stakes testing, several motivation filtering methods…

Descriptors: Multiple Choice Tests, Item Response Theory, College Students, Scores

Comparing Human and Automated Essay Scoring for Prospective Graduate Students with Learning Disabilities and/or ADHD

Peer reviewed

Direct link

Buzick, Heather; Oliveri, Maria Elena; Attali, Yigal; Flor, Michael – Applied Measurement in Education, 2016

Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K-12 large-scale assessment. In this…

Descriptors: Essays, Learning Disabilities, Attention Deficit Hyperactivity Disorder, Scoring

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

The Effectiveness of Machine Score-Ability Ratings in Predicting Automated Scoring Performance

Peer reviewed

Direct link

Lottridge, Susan; Wood, Scott; Shaw, Dan – Applied Measurement in Education, 2018

This study sought to provide a framework for evaluating machine score-ability of items using a new score-ability rating scale, and to determine the extent to which ratings were predictive of observed automated scoring performance. The study listed and described a set of factors that are thought to influence machine score-ability; these factors…

Descriptors: Program Effectiveness, Computer Assisted Testing, Test Scoring Machines, Scoring

Designing, Evaluating, and Deploying Automated Scoring Systems with Validity in Mind: Methodological Design Decisions

Peer reviewed

Direct link

Rupp, André A. – Applied Measurement in Education, 2018

This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…

Descriptors: Design, Automation, Scoring, Test Scoring Machines

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

Practical Issues in Large-Scale Computerized Adaptive Testing.

Peer reviewed

Mills, Craig N.; Stocking, Martha L. – Applied Measurement in Education, 1996

Issues that must be addressed in the large-scale application of computerized adaptive testing are explored, including considerations of test design, scoring, test administration, item and item bank development, and other aspects of test construction. Possible solutions and areas in which additional work is needed are identified. (SLD)

Descriptors: Adaptive Testing, Computer Assisted Testing, Elementary Secondary Education, Higher Education

Applications of Item Response Theory to Partial Credit Scoring.

Peer reviewed

Wise, Steven L., Ed.; And Others – Applied Measurement in Education, 1988

Six papers on the use of partial credit item response theory score models in applied measurement settings are presented. These applications include the scoring of medical certification examinations using computer-based patient simulations, narrative writing tests, and educational diagnosis. (TJH)

Descriptors: Clinical Diagnosis, Computer Assisted Testing, Computer Simulation, Educational Diagnosis

Testing for Differences in Test Score Distributions Using Loglinear Models.

Peer reviewed

Hanson, Bradley A. – Applied Measurement in Education, 1996

Determining whether score distributions differ on two or more test forms administered to samples of examinees from a single population is explored using three statistical tests using loglinear models. Examples are presented of applying tests of distribution differences to decide if equating is needed for alternative forms of a test. (SLD)

Descriptors: Equated Scores, Scoring, Statistical Distributions, Test Format

Identifying Possible Sources of Differential Functioning Using Differential Bundle Functioning with Polytomously Scored Data

Peer reviewed

Direct link

McCarty, F. A.; Oshima, T. C.; Raju, Nambury S. – Applied Measurement in Education, 2007

Oshima, Raju, Flowers, and Slinde (1998) described procedures for identifying sources of differential functioning for dichotomous data using differential bundle functioning (DBF) derived from the differential functioning of items and test (DFIT) framework (Raju, van der Linden, & Fleer, 1995). The purpose of this study was to extend the…

Descriptors: Rating Scales, Test Bias, Scoring, Test Items

A Comparison of the Generalizability of Scores Produced by Expert Raters and Automated Scoring Systems.

Peer reviewed

Clauser, Brian E.; Swanson, David B.; Clyman, Stephen G. – Applied Measurement in Education, 1999

Performed generalizability analyses of expert ratings and computer-produced scores for a computer-delivered performance assessment of physicians' patient management skills. The two automated scoring systems produced scores for the 200 medical students that were approximately as generalizable as those produced by the four expert raters. (SLD)

Descriptors: Comparative Analysis, Computer Assisted Testing, Generalizability Theory, Higher Education

Effects of Scale Anchors on Student Ratings of Instructors.

Peer reviewed

Dunham, Trudy C.; Davison, Mark L. – Applied Measurement in Education, 1990

The effects of packing or skewing the response options of a scale on the common measurement problems of leniency and range restriction in instructor ratings were assessed. Results from a sample of 130 undergraduate education students indicate that packing reduced leniency but had no effect on range restriction. (TJH)

Descriptors: Education Majors, Higher Education, Professors, Rating Scales

Examining the Costs of Performance Assessment.

Peer reviewed

Hardy, Roy A. – Applied Measurement in Education, 1995

Cost factors associated with the development, administration, and scoring of performance assessment tasks are examined in the context of a statewide or other large-scale assessment program. Resources of money, time, and expertise are discussed. (SLD)

Descriptors: Cost Estimates, Costs, Educational Assessment, Estimation (Mathematics)

The Reliability of Mathematics Portfolio Scores: Lessons from the Vermont Experience.

Peer reviewed

Klein, Stephen P.; And Others – Applied Measurement in Education, 1995

Portfolios are the centerpiece of Vermont's statewide assessment program in mathematics. Portfolio scores in the first two years were not reliable enough to permit the reporting of student-level results, but increasing the number of readers or the number of portfolio pieces is not operationally feasible. (SLD)

Descriptors: Educational Assessment, Elementary Secondary Education, Mathematics Tests, Performance Based Assessment

Development of a Scoring Algorithm To Replace Expert Rating for Scoring a Complex Performance-Based Assessment.

Peer reviewed

Clauser, Brian E.; Ross, Linette P.; Clyman, Stephen G.; Rose, Kathie M.; Margolis, Melissa J.; Nungester, Ronald J.; Piemme, Thomas E.; Chang, Lucy; El-Bayoumi, Gigi; Malakoff, Gary L.; Pincetl, Pierre S. – Applied Measurement in Education, 1997

Describes an automated scoring algorithm for a computer-based simulation examination of physicians' patient-management skills. Results with 280 medical students show that scores produced using this algorithm are highly correlated to actual clinician ratings. Scores were also effective in discriminating between case performance judged passing or…

Descriptors: Algorithms, Computer Assisted Testing, Computer Simulation, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Scoring	16
Computer Assisted Testing	8
Rating Scales	4
Test Construction	4
Test Scoring Machines	4
Automation	3
Comparative Analysis	3
Correlation	3
Higher Education	3
Performance Based Assessment	3
Scores	3
Test Items	3
Testing Problems	3
Testing Programs	3
College Students	2
Computer Simulation	2
Decision Making	2
Educational Assessment	2
Educational Testing	2
Elementary Secondary Education	2
Essays	2
Estimation (Mathematics)	2
Evaluation Methods	2
Evaluators	2
Generalizability Theory	2
More ▼

Clauser, Brian E.	2
Clyman, Stephen G.	2
Attali, Yigal	1
Ben-Simon, Anat	1
Buzick, Heather	1
Chang, Lucy	1
Christine E. DeMars	1
Cohen, Yoav	1
Davison, Mark L.	1
Dunham, Trudy C.	1
El-Bayoumi, Gigi	1
Flor, Michael	1
Haladyna, Thomas M.	1
Hanson, Bradley A.	1
Hardy, Roy A.	1
Klein, Stephen P.	1
Levi, Effi	1
Lottridge, Susan	1
Malakoff, Gary L.	1
Margolis, Melissa J.	1
McCarty, F. A.	1
Mills, Craig N.	1
Nungester, Ronald J.	1
Oliveri, Maria Elena	1
More ▼