ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	15

Descriptor

Comparative Analysis	19
Error of Measurement	19
Scoring	19
Item Response Theory	8
Accuracy	6
Test Items	5
Computer Assisted Testing	4
Equated Scores	4
Statistical Bias	4
Adaptive Testing	3
Evaluators	3
Reading Achievement	3
Reliability	3
Scores	3
Simulation	3
Statistical Analysis	3
Academic Achievement	2
Bayesian Statistics	2
Classification	2
Comparative Testing	2
Correlation	2
Demography	2
Elementary School Students	2
English (Second Language)	2
Generalizability Theory	2
More ▼

Source

ProQuest LLC	3
ETS Research Report Series	2
Educational and Psychological…	2
Applied Measurement in…	1
CALICO Journal	1
Gifted Child Quarterly	1
International Journal of…	1
Journal of Educational…	1
Journal of Educational and…	1
Journal of Psychoeducational…	1
Journal of School Choice	1
National Center for Education…	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	11
Reports - Evaluative	4
Dissertations/Theses -…	3
Guides - Non-Classroom	1
Speeches/Meeting Papers	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Higher Education	1
Kindergarten	1
Postsecondary Education	1

Audience

Location

China

Laws, Policies, & Programs

Assessments and Surveys

Early Childhood Longitudinal…	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Estimation of Expected Fisher Information for IRT Models

Peer reviewed

Direct link

Monroe, Scott – Journal of Educational and Behavioral Statistics, 2019

In item response theory (IRT) modeling, the Fisher information matrix is used for numerous inferential procedures such as estimating parameter standard errors, constructing test statistics, and facilitating test scoring. In principal, these procedures may be carried out using either the expected information or the observed information. However, in…

Descriptors: Item Response Theory, Error of Measurement, Scoring, Inferences

A Fair Comparison of the Performance of Computerized Adaptive Testing and Multistage Adaptive Testing

Direct link

Wang, Keyin – ProQuest LLC, 2017

The comparison of item-level computerized adaptive testing (CAT) and multistage adaptive testing (MST) has been researched extensively (e.g., Kim & Plake, 1993; Luecht et al., 1996; Patsula, 1999; Jodoin, 2003; Hambleton & Xing, 2006; Keng, 2008; Zheng, 2012). Various CAT and MST designs have been investigated and compared under the same…

Descriptors: Comparative Analysis, Computer Assisted Testing, Adaptive Testing, Test Items

Effectiveness of Item Response Theory (IRT) Proficiency Estimation Methods under Adaptive Multistage Testing. Research Report. ETS RR-15-11

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry – ETS Research Report Series, 2015

The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…

Descriptors: Item Response Theory, Computation, Statistical Bias, Error of Measurement

Internet Administration of the Paper-and-Pencil Gifted Rating Scale: Assessing Psychometric Equivalence

Peer reviewed

Direct link

Yarnell, Jordy B.; Pfeiffer, Steven I. – Journal of Psychoeducational Assessment, 2015

The present study examined the psychometric equivalence of administering a computer-based version of the Gifted Rating Scale (GRS) compared with the traditional paper-and-pencil GRS-School Form (GRS-S). The GRS-S is a teacher-completed rating scale used in gifted assessment. The GRS-Electronic Form provides an alternative method of administering…

Descriptors: Gifted, Psychometrics, Rating Scales, Computer Assisted Testing

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of "WriteToLearn"

Peer reviewed
PDF on ERIC

Download full text

Liu, Sha; Kunnan, Antony John – CALICO Journal, 2016

This study investigated the application of "WriteToLearn" on Chinese undergraduate English majors' essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Choosing among Tucker or Chained Linear Equating in Two Testing Situations: Rater Comparability Scoring and Randomly Equivalent Groups with an Anchor

Peer reviewed

Direct link

Puhan, Gautam – Journal of Educational Measurement, 2012

Tucker and chained linear equatings were evaluated in two testing scenarios. In Scenario 1, referred to as rater comparability scoring and equating, the anchor-to-total correlation is often very high for the new form but moderate for the reference form. This may adversely affect the results of Tucker equating, especially if the new and reference…

Descriptors: Testing, Scoring, Equated Scores, Statistical Analysis

Optimal Scoring Methods of Hand-Strength Tests in Patients with Stroke

Peer reviewed

Direct link

Huang, Sheau-Ling; Hsieh, Ching-Lin; Lin, Jau-Hong; Chen, Hui-Mei – International Journal of Rehabilitation Research, 2011

The purpose of this study was to determine the optimal scoring methods for measuring strength of the more-affected hand in patients with stroke by examining the effect of reducing measurement errors. Three hand-strength tests of grip, palmar pinch, and lateral pinch were administered at two sessions in 56 patients with stroke. Five scoring methods…

Descriptors: Patients, Scoring, Error of Measurement, Neurological Impairments

Oral Performace Scoring Using Generalizability Theory and Many-Facet Rasch Measurement: A Comparison Study

Direct link

Alkahtani, Saif F. – ProQuest LLC, 2012

The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent…

Descriptors: Generalizability Theory, Item Response Theory, Verbal Tests, Islam

Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011): User's Manual for the ECLS-K:2011 Kindergarten-Fourth Grade Data File and Electronic Codebook, Public Version. NCES 2018-032

Peer reviewed
PDF on ERIC

Download full text

Tourangeau, Karen; Nord, Christine; Lê, Thanh; Wallner-Allen, Kathleen; Vaden-Kiernan, Nancy; Blaker, Lisa; Najarian, Michelle – National Center for Education Statistics, 2018

This manual provides guidance and documentation for users of the longitudinal kindergarten-fourth grade (K-4) data file of the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011). It mainly provides information specific to the fourth-grade round of data collection. The first chapter provides an overview of the…

Descriptors: Children, Longitudinal Studies, Surveys, Kindergarten

Generalizability Theory: Measuring the Dependability of Selected Methods for Scoring Classroom Assessments

Direct link

Lengh, Carolyn J. – ProQuest LLC, 2010

This study compares the dependability of four classroom assessment scoring methods. Generalizability theory (G) and alternative decision (D) are used to measure the results of students' classroom assessment scores and compare the results of the four scoring methods on variability of rater by person variance and the level of G and D coefficients…

Descriptors: Generalizability Theory, Scoring, Social Studies, Tests

DIF Trees: Using Classification Trees to Detect Differential Item Functioning

Peer reviewed

Direct link

Vaughn, Brandon K.; Wang, Qiu – Educational and Psychological Measurement, 2010

A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

Descriptors: Test Bias, Classification, Nonparametric Statistics, Regression (Statistics)

Wise and Proper Use of National Assessment of Educational Progress (NAEP) Data

Peer reviewed

Direct link

Innes, Richard G. – Journal of School Choice, 2012

This article provides examples of how serious misconceptions can result when only "all student" scores from the National Assessment of Educational Progress (NAEP) are used for simplistic state-to-state comparisons. Suggestions for better treatment are presented. The article also compares Kentucky's eighth grade EXPLORE testing to NAEP…

Descriptors: National Competency Tests, Scoring, Misconceptions, Academic Achievement

Methods of Linking with Small Samples in a Common-Item Design: An Empirical Comparison. Research Report. ETS RR-09-38

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009

A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…

Descriptors: Sampling, Sample Size, Accuracy, Test Items

Identifying Academically Gifted English-Language Learners Using Nonverbal Tests: A Comparison of the Raven, NNAT, and CogAT

Peer reviewed

Direct link

Lohman, David F.; Korb, Katrina A.; Lakin, Joni M. – Gifted Child Quarterly, 2008

In this study, the authors compare the validity of three nonverbal tests for the purpose of identifying academically gifted English-language learners (ELLs). Participants were 1,198 elementary children (approximately 40% ELLs). All were administered the Raven Standard Progressive Matrices (Raven), the Naglieri Nonverbal Ability Test (NNAT), and…

Descriptors: Academically Gifted, Nonverbal Tests, Scoring, National Norms

Previous Page | Next Page »

Pages: 1 | 2

Kim, Sooyeon	2
Alkahtani, Saif F.	1
Bejar, Isaac I.	1
Blaker, Lisa	1
Chen, Hui-Mei	1
Hanson, Bradley A.	1
Hsieh, Ching-Lin	1
Huang, Sheau-Ling	1
Innes, Richard G.	1
Kim, Stella Yun	1
Korb, Katrina A.	1
Kunnan, Antony John	1
Lakin, Joni M.	1
Lee, Won-Chan	1
Lengh, Carolyn J.	1
Lin, Jau-Hong	1
Linacre, John M.	1
Liu, Sha	1
Livingston, Samuel A.	1
Lohman, David F.	1
Lê, Thanh	1
Monroe, Scott	1
Moses, Tim	1
Najarian, Michelle	1
More ▼