ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	12

Source

Applied Measurement in…

Publication Type

Journal Articles	28
Reports - Evaluative	28
Reports - Research	3
Information Analyses	2
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	3
Elementary Education	1
Grade 11	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Kindergarten	1

Audience

Location

Georgia	1
Texas	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Georgia Criterion Referenced…	1
Graduate Record Examinations	1
National Assessment of…	1
Program for International…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

An Analytic Comparison of Effect Sizes for Differential Item Functioning

Peer reviewed

Direct link

Demars, Christine E. – Applied Measurement in Education, 2011

Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…

Descriptors: Effect Size, Test Bias, Probability, Difficulty Level

Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity, and Country

Peer reviewed

Direct link

Bridgeman, Brent; Trapani, Catherine; Attali, Yigal – Applied Measurement in Education, 2012

Essay scores generated by machine and by human raters are generally comparable; that is, they can produce scores with similar means and standard deviations, and machine scores generally correlate as highly with human scores as scores from one human correlate with scores from another human. Although human and machine essay scores are highly related…

Descriptors: Scoring, Essay Tests, College Entrance Examinations, High Stakes Tests

The Utility of Augmented Subscores in a Licensure Exam: An Evaluation of Methods Using Empirical Data

Peer reviewed

Direct link

Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010

Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…

Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods

Gender Differences in Large-Scale Math Assessments: PISA Trend 2000 and 2003

Peer reviewed

Direct link

Liu, Ou Lydia; Wilson, Mark – Applied Measurement in Education, 2009

Many efforts have been made to determine and explain differential gender performance on large-scale mathematics assessments. A well-agreed-on conclusion is that gender differences are contextualized and vary across math domains. This study investigated the pattern of gender differences by item domain (e.g., Space and Shape, Quantity) and item type…

Descriptors: Gender Differences, Mathematics Tests, Measurement, Test Format

Mathematics Performance of Students with and without Disabilities under Accommodated Conditions Using Resource Guides and Calculators on High Stakes Tests

Peer reviewed

Direct link

Engelhard, George, Jr.; Fincher, Melissa; Domaleski, Christopher S. – Applied Measurement in Education, 2011

This study examines the effects of two test administration accommodations on the mathematics performance of students within the context of a large-scale statewide assessment. The two test administration accommodations were resource guides and calculators. A stratified random sample of schools was selected to represent the demographic…

Descriptors: Testing Accommodations, Disabilities, High Stakes Tests, Program Effectiveness

Comparability of Computer- and Paper-Administered Multiple-Choice Tests for K-12 Populations: A Synthesis

Peer reviewed

Direct link

Kingston, Neal M. – Applied Measurement in Education, 2009

There have been many studies of the comparability of computer-administered and paper-administered tests. Not surprisingly (given the variety of measurement and statistical sampling issues that can affect any one study) the results of such studies have not always been consistent. Moreover, the quality of computer-based test administration systems…

Descriptors: Multiple Choice Tests, Computer Assisted Testing, Printed Materials, Effect Size

Item-Level Comparative Analysis of Online and Paper Administrations of the Texas Assessment of Knowledge and Skills

Peer reviewed

Direct link

Keng, Leslie; McClarty, Katie Larsen; Davis, Laurie Laughlin – Applied Measurement in Education, 2008

This article describes a comparative study conducted at the item level for paper and online administrations of a statewide high stakes assessment. The goal was to identify characteristics of items that may have contributed to mode effects. Item-level analyses compared two modes of the Texas Assessment of Knowledge and Skills (TAKS) for up to four…

Descriptors: Computer Assisted Testing, Geometric Concepts, Grade 8, Comparative Analysis

Setting Passing Scores on Passage-Based Tests: A Comparison of Traditional and Single-Passage Bookmark Methods

Peer reviewed

Direct link

Skaggs, Gary; Hein, Serge F.; Awuor, Risper – Applied Measurement in Education, 2007

In this study, a variation of the bookmark standard setting procedure for passage-based tests is proposed in which separate ordered item booklets are created for the items associated with each passage. This variation is compared to the traditional bookmark procedure for a fifth-grade reading test. The results showed that the single-passage…

Descriptors: Reading Tests, Standard Setting, Cutting Scores, Grade 5

Detection of Item Parameter Drift over Multiple Test Administrations

Peer reviewed

Direct link

DeMars, Christine E. – Applied Measurement in Education, 2004

Three methods of detecting item drift were compared: the procedure in BILOG-MG for estimating linear trends in item difficulty, the CUSUM procedure that Veerkamp and Glas (2000) used to detect trends in difficulty or discrimination, and a modification of Kim, Cohen, and Park's (1995) x 2 test for multiple-group differential item functioning (DIF),…

Descriptors: Comparative Analysis, Test Items, Testing, Item Analysis

Robustness to Format Effects of IRT Linking Methods for Mixed-Format Tests

Peer reviewed

Direct link

Kim, Seonghoon; Kolen, Michael J. – Applied Measurement in Education, 2006

Four item response theory linking methods (2 moment methods and 2 characteristic curve methods) were compared to concurrent (CO) calibration with the focus on the degree of robustness to format effects (FEs) when applying the methods to multidimensional data that reflected the FEs associated with mixed-format tests. Based on the quantification of…

Descriptors: Item Response Theory, Robustness (Statistics), Test Format, Comparative Analysis

Vertical Scaling with the Rasch Model Utilizing Default and Tight Convergence Settings with WINSTEPS and BILOG-MG

Peer reviewed

Direct link

Custer, Michael; Omar, Md Hafidz; Pomplun, Mark – Applied Measurement in Education, 2006

This study compared vertical scaling results for the Rasch model from BILOG-MG and WINSTEPS. The item and ability parameters for the simulated vocabulary tests were scaled across 11 grades; kindergarten through 10th. Data were based on real data and were simulated under normal and skewed distribution assumptions. WINSTEPS and BILOG-MG were each…

Descriptors: Models, Scaling, Computer Software, Vocabulary

A Comparison among IRT True- and Observed-Score Equatings and Traditional Equipercentile Equating.

Peer reviewed

Han, Tianqi; And Others – Applied Measurement in Education, 1997

Stability among equating procedures was studied by comparing item response theory (IRT) true-score equating with IRT observed-score equating, IRT true-score equating with equipercentile equating, and IRT observed-score equating with equipercentile equating. On average, IRT true-score equating more frequently produced more stable conversions. (SLD)

Descriptors: Comparative Analysis, Equated Scores, Item Response Theory, Raw Scores

Simultaneous Use of Multiple Answer Copying Indexes to Improve Detection Rates

Peer reviewed

Direct link

Wollack, James A. – Applied Measurement in Education, 2006

Many of the currently available statistical indexes to detect answer copying lack sufficient power at small [alpha] levels or when the amount of copying is relatively small. Furthermore, there is no one index that is uniformly best. Depending on the type or amount of copying, certain indexes are better than others. The purpose of this article was…

Descriptors: Statistical Analysis, Item Analysis, Test Length, Sample Size

A Simulation Comparison of Parametric and Nonparametric Dimensionality Detection Procedures

Peer reviewed

Direct link

Mroch, Andrew A.; Bolt, Daniel M. – Applied Measurement in Education, 2006

Recently, nonparametric methods have been proposed that provide a dimensionally based description of test structure for tests with dichotomous items. Because such methods are based on different notions of dimensionality than are assumed when using a psychometric model, it remains unclear whether these procedures might lead to a different…

Descriptors: Simulation, Comparative Analysis, Psychometrics, Methods Research

A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests.

Peer reviewed

Kingsbury, G. Gage; Zara, Anthony R. – Applied Measurement in Education, 1991

This simulation investigated two procedures that reduce differences between paper-and-pencil testing and computerized adaptive testing (CAT) by making CAT content sensitive. Results indicate that the price in terms of additional test items of using constrained CAT for content balancing is much smaller than that of using testlets. (SLD)

Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Computer Simulation

Previous Page | Next Page »

Pages: 1 | 2

Comparative Analysis	28
Item Response Theory	8
Test Items	7
Computer Assisted Testing	5
Mathematics Tests	5
Elementary Secondary Education	4
Scores	4
Decision Making	3
Educational Assessment	3
Effect Size	3
Elementary School Students	3
Equated Scores	3
Estimation (Mathematics)	3
Item Analysis	3
Licensing Examinations…	3
Mathematical Models	3
Models	3
Multiple Choice Tests	3
Psychometrics	3
Standard Setting (Scoring)	3
Test Construction	3
Test Format	3
Test Results	3
Adaptive Testing	2
Computer Simulation	2
More ▼

Linn, Robert L.	3
Puhan, Gautam	2
Attali, Yigal	1
Awuor, Risper	1
Bolt, Daniel M.	1
Bridgeman, Brent	1
Cohen, Allan S.	1
Custer, Michael	1
Davis, Laurie Laughlin	1
De Ayala, R. J.	1
DeMars, Christine E.	1
Demars, Christine E.	1
Domaleski, Christopher S.	1
Du, Yi	1
Engelhard, George, Jr.	1
Ercikan, Kadriye	1
Fincher, Melissa	1
Gierl, Mark J.	1
Haberman, Shelby	1
Han, Tianqi	1
Hein, Serge F.	1
Hirsch, Thomas M.	1
Keng, Leslie	1
Kim, Seock-Ho	1
More ▼