ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	17

Descriptor

Scores	28
Test Items	28
Test Theory	28
Difficulty Level	10
Item Analysis	8
Comparative Analysis	7
Mathematical Models	6
Statistical Analysis	6
Test Reliability	6
Item Response Theory	5
Test Construction	5
Test Validity	5
Computation	4
Error of Measurement	4
Estimation (Mathematics)	4
Measurement Techniques	4
Psychometrics	4
Test Wiseness	4
Testing	4
Criterion Referenced Tests	3
Foreign Countries	3
Goodness of Fit	3
Reliability	3
Simulation	3
Statistical Distributions	3
More ▼

Publication Type

Reports - Research	21
Journal Articles	20
Speeches/Meeting Papers	6
Reports - Evaluative	4
Reports - Descriptive	2
Numerical/Quantitative Data	1
Opinion Papers	1

Education Level

Higher Education	3
Elementary Education	2
Grade 8	2
Postsecondary Education	2
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 9	1
High Schools	1
Kindergarten	1
Middle Schools	1
Secondary Education	1
More ▼

Audience

Researchers

Location

Canada	1
Europe	1
Florida	1
Hawaii	1
Turkey (Ankara)	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	2
Stanford Early School…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

Added Value of Subscores for Tests with Polytomous Items

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Educational and Psychological Measurement, 2025

Test-takers, policymakers, teachers, and institutions are increasingly demanding that testing programs provide more detailed feedback regarding test performance. As a result, there has been a growing interest in the reporting of subscores that potentially provide such detailed feedback. Haberman developed a method based on classical test theory…

Descriptors: Scores, Test Theory, Test Items, Testing

Assessment of Item and Test Parameters: Cosine Similarity Approach

Peer reviewed
PDF on ERIC

Download full text

Chakrabartty, Satyendra Nath – International Journal of Psychology and Educational Studies, 2021

The paper proposes new measures of difficulty and discriminating values of binary items and test consisting of such items and find their relationships including estimation of test error variance and thereby the test reliability, as per definition using cosine similarities. The measures use entire data. Difficulty value of test and item is defined…

Descriptors: Test Items, Difficulty Level, Scores, Test Reliability

Classical Item Analysis from a Signal Detection Perspective

Peer reviewed

Direct link

DeCarlo, Lawrence T. – Journal of Educational Measurement, 2023

A conceptualization of multiple-choice exams in terms of signal detection theory (SDT) leads to simple measures of item difficulty and item discrimination that are closely related to, but also distinct from, those used in classical item analysis (CIA). The theory defines a "true split," depending on whether or not examinees know an item,…

Descriptors: Multiple Choice Tests, Test Items, Item Analysis, Test Wiseness

A Design for Comparing CTT and IRT in Test Assembly, Scoring and Argumentation: Differences among Reliability, Information and Validation

Peer reviewed

Direct link

Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019

This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…

Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring

Effects of Various Simulation Conditions on Latent-Trait Estimates: A Simulation Study

Peer reviewed
PDF on ERIC

Download full text

Kogar, Hakan – International Journal of Assessment Tools in Education, 2018

The aim of this simulation study, determine the relationship between true latent scores and estimated latent scores by including various control variables and different statistical models. The study also aimed to compare the statistical models and determine the effects of different distribution types, response formats and sample sizes on latent…

Descriptors: Simulation, Context Effect, Computation, Statistical Analysis

Facilitating the Interpretation of English Language Proficiency Scores: Combining Scale Anchoring and Test Score Mapping Methodologies

Peer reviewed

Direct link

Powers, Donald; Schedl, Mary; Papageorgiou, Spiros – Language Testing, 2017

The aim of this study was to develop, for the benefit of both test takers and test score users, enhanced "TOEFL ITP"® test score reports that go beyond the simple numerical scores that are currently reported. To do so, we applied traditional scale anchoring (proficiency scaling) to item difficulty data in order to develop performance…

Descriptors: English (Second Language), Second Language Learning, Language Proficiency, Scores

Gender Fairness within the Force Concept Inventory

Peer reviewed

Direct link

Traxler, Adrienne; Henderson, Rachel; Stewart, John; Stewart, Gay; Papak, Alexis; Lindell, Rebecca – Physical Review Physics Education Research, 2018

Research on the test structure of the Force Concept Inventory (FCI) has largely ignored gender, and research on FCI gender effects (often reported as "gender gaps") has seldom interrogated the structure of the test. These rarely crossed streams of research leave open the possibility that the FCI may not be structurally valid across…

Descriptors: Physics, Science Instruction, Sex Fairness, Gender Differences

Rating Quality Studies Using Rasch Measurement Theory. Research Report 2013-3

Download full text

Engelhard, George, Jr.; Wind, Stefanie A. – College Board, 2013

The major purpose of this study is to examine the quality of ratings assigned to CR (constructed-response) questions in large-scale assessments from the perspective of Rasch Measurement Theory. Rasch Measurement Theory provides a framework for the examination of rating scale category structure that can yield useful information for interpreting the…

Descriptors: Measurement Techniques, Rating Scales, Test Theory, Scores

Development of the Enzyme-Substrate Interactions Concept Inventory

Peer reviewed

Direct link

Bretz, Stacey Lowery; Linenberger, Kimberly J. – Biochemistry and Molecular Biology Education, 2012

Enzyme function is central to student understanding of multiple topics within the biochemistry curriculum. In particular, students must understand how enzymes and substrates interact with one another. This manuscript describes the development of a 15-item Enzyme-Substrate Interactions Concept Inventory (ESICI) that measures student understanding…

Descriptors: Biochemistry, Science Education, Science Instruction, Scientific Concepts

A Control Systems Concept Inventory Test Design and Assessment

Peer reviewed

Direct link

Bristow, M.; Erkorkmaz, K.; Huissoon, J. P.; Jeon, Soo; Owen, W. S.; Waslander, S. L.; Stubley, G. D. – IEEE Transactions on Education, 2012

Any meaningful initiative to improve the teaching and learning in introductory control systems courses needs a clear test of student conceptual understanding to determine the effectiveness of proposed methods and activities. The authors propose a control systems concept inventory. Development of the inventory was collaborative and iterative. The…

Descriptors: Diagnostic Tests, Concept Formation, Undergraduate Students, Engineering Education

Studying Reliability of Open Ended Mathematics Items According to the Classical Test Theory and Generalizability Theory

Peer reviewed
PDF on ERIC

Download full text

Guler, Nese; Gelbal, Selahattin – Educational Sciences: Theory and Practice, 2010

In this study, the Classical test theory and generalizability theory were used for determination to reliability of scores obtained from measurement tool of mathematics success. 24 open-ended mathematics question of the TIMSS-1999 was applied to 203 students in 2007-spring semester. Internal consistency of scores was found as 0.92. For…

Descriptors: Generalizability Theory, Test Theory, Test Reliability, Interrater Reliability

Accessibility Theory for Enhancing the Validity of Test Results for Students with Special Needs

Peer reviewed

Direct link

Beddow, Peter A. – International Journal of Disability, Development and Education, 2012

In the arena of educational testing, accessibility refers to the degree to which students are given the opportunity to participate in and engage a test. Accessibility theory is a model for examining the interactions between the test-taker and the test itself and defining how they may decrease some students' access to the test event, ultimately…

Descriptors: Test Results, Test Items, Educational Testing, Scores

The Theil-Sen Slope for High-Stakes Decisions from Progress Monitoring

Peer reviewed

Direct link

Vannest, Kimberly J.; Parker, Richard I.; Davis, John L.; Soares, Denise A.; Smith, Stacey L. – Behavioral Disorders, 2012

More and more, schools are considering the use of progress monitoring data for high-stakes decisions such as special education eligibility, program changes to more restrictive environments, and major changes in educational goals. Those high-stakes types of data-based decisions will need methodological defensibility. Current practice for…

Descriptors: Decision Making, Educational Change, Regression (Statistics), Field Tests

Efficiency of Predicting Risk in Word Reading Using Fewer, Easier Letters

Peer reviewed

Direct link

Petscher, Yaacov; Kim, Young-Suk – Assessment for Effective Intervention, 2011

Letter-name identification has been widely used as part of early screening to identify children who might be at risk for future word reading difficulty. The goal of the present study was to examine whether a reduced set of letters could have similar diagnostic accuracy rather than a full set (i.e., 26 letters) when used as a screen. First, we…

Descriptors: Clinical Diagnosis, Measures (Individuals), Risk, Reading

On Validity Theory and Test Validation

Peer reviewed

Direct link

Sireci, Stephen G. – Educational Researcher, 2007

Lissitz and Samuelsen (2007) propose a new framework for conceptualizing test validity that separates analysis of test properties from analysis of the construct measured. In response, the author of this article reviews fundamental characteristics of test validity, drawing largely from seminal writings as well as from the accepted standards. He…

Descriptors: Test Content, Test Validity, Guidelines, Test Items

Previous Page | Next Page »

Pages: 1 | 2

Educational and Psychological…	4
Assessment for Effective…	1
Behavioral Disorders	1
Behavioral Research and…	1
Biochemistry and Molecular…	1
College Board	1
ETS Research Report Series	1
Educational Researcher	1
Educational Sciences: Theory…	1
IEEE Transactions on Education	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Educational…	1
Journal of Educational…	1
Journal on Educational…	1
Language Testing	1
Physical Review Physics…	1
Teaching of Psychology	1
More ▼

Alqarni, Abdulelah Mohammed	1
Balch, William R.	1
Beddow, Peter A.	1
Borrello, Gloria M.	1
Bretz, Stacey Lowery	1
Bristow, M.	1
Chakrabartty, Satyendra Nath	1
Davis, John L.	1
DeCarlo, Lawrence T.	1
Engelhard, George, Jr.	1
Erkorkmaz, K.	1
Gelbal, Selahattin	1
Graham, James M.	1
Guler, Nese	1
Gustafsson, Jan-Eric	1
Haberman, Shelby J.	1
Hambleton, Ronald K.	1
Henderson, Rachel	1
Huissoon, J. P.	1
Hwang, Dae-Yeop	1
Jeon, Soo	1
Jung, Eunju	1
Ketterlin-Geller, Leanne R.	1
Kim, Young-Suk	1
Kogar, Hakan	1
More ▼