ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	20

Descriptor

Comparative Analysis	32
Scoring	32
Simulation	25
Computer Assisted Testing	11
Test Items	10
Adaptive Testing	9
Item Response Theory	8
Computer Simulation	7
Models	7
Correlation	6
Bayesian Statistics	5
Mathematical Models	5
Statistical Analysis	5
Accuracy	4
Computer Software	4
Estimation (Mathematics)	4
Evaluators	4
Interrater Reliability	4
Item Analysis	4
Monte Carlo Methods	4
Pretests Posttests	4
Probability	4
Scores	4
Test Construction	4
Ability	3
More ▼

Source

ETS Research Report Series	3
Journal of Educational…	3
Journal of Educational and…	3
ProQuest LLC	3
Applied Psychological…	2
Applied Measurement in…	1
Assessment	1
JALT CALL Journal	1
Journal of Applied Measurement	1
Journal of Educational…	1
Language, Speech, and Hearing…	1
Mathematical Thinking and…	1
National Center for Research…	1
Practical Assessment,…	1
More ▼

Publication Type

Reports - Research	24
Journal Articles	19
Speeches/Meeting Papers	6
Reports - Evaluative	5
Dissertations/Theses -…	3
Tests/Questionnaires	2
Numerical/Quantitative Data	1

Education Level

Elementary Education	2
Early Childhood Education	1
Grade 2	1
Grade 3	1
Higher Education	1
Postsecondary Education	1
Primary Education	1

Audience

Location

Iran	1
Netherlands	1
New York	1

Laws, Policies, & Programs

Assessments and Surveys

Center for Epidemiologic…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 32 results Save | Export

Analyzing Polytomous Test Data: A Comparison between an Information-Based IRT Model and the Generalized Partial Credit Model

Peer reviewed

Direct link

Joakim Wallmark; James O. Ramsay; Juan Li; Marie Wiberg – Journal of Educational and Behavioral Statistics, 2024

Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker's attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of…

Descriptors: Item Response Theory, Test Items, Models, Scoring

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Estimation of Expected Fisher Information for IRT Models

Peer reviewed

Direct link

Monroe, Scott – Journal of Educational and Behavioral Statistics, 2019

In item response theory (IRT) modeling, the Fisher information matrix is used for numerous inferential procedures such as estimating parameter standard errors, constructing test statistics, and facilitating test scoring. In principal, these procedures may be carried out using either the expected information or the observed information. However, in…

Descriptors: Item Response Theory, Error of Measurement, Scoring, Inferences

Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

Peer reviewed
PDF on ERIC

Download full text

Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca – Practical Assessment, Research & Evaluation, 2018

Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…

Descriptors: Interrater Reliability, Measurement, Observation, Educational Research

Features of a Pan Balance That May Support Students' Developing Understanding of Mathematical Equivalence

Peer reviewed

Direct link

Bajwa, Neet Priya; Perry, Michelle – Mathematical Thinking and Learning: An International Journal, 2021

Elementary school students struggle in interpreting the equal sign as a symbol denoting equivalence. Although many have advocated using a pan-balance scale to help students develop this understanding, less is known about what features associated with this model support learning. To attempt to control and examine these features, the investigators…

Descriptors: Mathematics Skills, Mathematics Instruction, Elementary School Students, Concept Formation

A Fair Comparison of the Performance of Computerized Adaptive Testing and Multistage Adaptive Testing

Direct link

Wang, Keyin – ProQuest LLC, 2017

The comparison of item-level computerized adaptive testing (CAT) and multistage adaptive testing (MST) has been researched extensively (e.g., Kim & Plake, 1993; Luecht et al., 1996; Patsula, 1999; Jodoin, 2003; Hambleton & Xing, 2006; Keng, 2008; Zheng, 2012). Various CAT and MST designs have been investigated and compared under the same…

Descriptors: Comparative Analysis, Computer Assisted Testing, Adaptive Testing, Test Items

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

From a Distance: Comparison of In-Person and Virtual Assessments with Adult-Child Dyads from Linguistically Diverse Backgrounds

Peer reviewed

Direct link

Pratt, Amy S.; Anaya, Jissel B.; Ramos, Michelle N.; Pham, Giang; Muñoz, Miriam; Bedore, Lisa M.; Peña, Elizabeth D. – Language, Speech, and Hearing Services in Schools, 2022

Purpose: Our proof-of-concept study tested the feasibility of virtual testing using child assessments that were originally validated for in-person testing only. Method: Ten adult-child dyads were assigned to complete both in-person and virtual tests of language, cognition, and narratives. Child participants fell between the ages of 4 and 8 years;…

Descriptors: Evaluation Methods, Language Tests, Intelligence Tests, Narration

Modeling Goal Setting within a Multimedia Environment on Complex Physics Content

Peer reviewed

Direct link

Twyford, Jessica; Craig, Scotty D. – Journal of Educational Computing Research, 2017

Observational tutoring has been found to be an effective method for teaching a variety of subjects by reusing dialogue from previous successful tutoring sessions. While it has been shown content can be learned through observational tutoring, it has yet to been examined if a secondary behavior such as goal setting can be influenced. The present…

Descriptors: Pretests Posttests, Physics, Science Instruction, Teaching Methods

A Comparative Analysis of Face to Face Instruction vs. Telegram Mobile Instruction in Terms of Narrative Writing

Peer reviewed
PDF on ERIC

Download full text

Heidari, Jamshid; Khodabandeh, Farzaneh; Soleimani, Hassan – JALT CALL Journal, 2018

The emergence of computer technology in English language teaching has paved the way for teachers' application of Mobile Assisted Language Learning (mall) and its advantages in teaching. This study aimed to compare the effectiveness of the face to face instruction with Telegram mobile instruction. Based on a toefl test, 60 English foreign language…

Descriptors: Comparative Analysis, Conventional Instruction, Teaching Methods, Computer Assisted Instruction

New and Improved? A Comparison of the Original and Revised Versions of the Structured Interview of Reported Symptoms

Peer reviewed

Direct link

Green, Debbie; Rosenfeld, Barry; Belfi, Brian – Assessment, 2013

The current study evaluated the accuracy of the Structured Interview of Reported Symptoms, Second Edition (SIRS-2) in a criterion-group study using a sample of forensic psychiatric patients and a community simulation sample, comparing it to the original SIRS and to results published in the SIRS-2 manual. The SIRS-2 yielded an impressive…

Descriptors: Structured Interviews, Comparative Analysis, Patients, Simulation

An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

Peer reviewed
PDF on ERIC

Download full text

Ali, Usama S.; Chang, Hua-Hua – ETS Research Report Series, 2014

Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…

Descriptors: Adaptive Testing, Simulation, Pretests Posttests, Test Items

A Comparison between Some Generalized Mantel-Haenszel Statistics for Detecting DIF in Data Simulated under the Graded Response Model

Peer reviewed

Direct link

Fidalgo, Angel M.; Bartram, Dave – Applied Psychological Measurement, 2010

The main objective of this study was to establish the relative efficacy of the generalized Mantel-Haenszel test (GMH) and the Mantel test for detecting large numbers of differential item functioning (DIF) patterns. To this end this study considered a topic not dealt with in the literature to date: the possible differential effect of type of scores…

Descriptors: Test Bias, Statistics, Scoring, Comparative Analysis

A Comparison of Item Calibration Procedures in the Presence of Test Speededness

Peer reviewed

Direct link

Suh, Youngsuk; Cho, Sun-Joo; Wollack, James A. – Journal of Educational Measurement, 2012

In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end-of-test items (i.e., speeded items). This article conducted a systematic comparison of five-item calibration procedures--a two-parameter logistic (2PL) model, a…

Descriptors: Response Style (Tests), Timed Tests, Test Items, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3

Ali, Usama S.	1
Allan S. Cohen	1
Anaya, Jissel B.	1
Bajwa, Neet Priya	1
Bartram, Dave	1
Bedore, Lisa M.	1
Bejar, Isaac I.	1
Belfi, Brian	1
Breyer, F. Jay	1
Chang, Hua-Hua	1
Cho, Sun-Joo	1
Clauser, Brian E.	1
Clyman, Stephen G.	1
Craig, Scotty D.	1
De Ayala, R. J.	1
DeAyala, R. J.	1
DeCarlo, Lawrence T.	1
Deng, Nina	1
Dinero, Thomas E.	1
Fidalgo, Angel M.	1
Finkelman, Matthew D.	1
Gillespie Rouse, Amy	1
Green, Debbie	1
Haertel, Edward	1
More ▼