ERIC - Search Results

Publication Date

In 2025	2
Since 2024	4
Since 2021 (last 5 years)	13
Since 2016 (last 10 years)	54
Since 2006 (last 20 years)	99

Descriptor

Scoring	161
Test Items	161
Test Reliability	111
Test Validity	68
Test Construction	63
Item Response Theory	42
Interrater Reliability	41
Psychometrics	33
Scores	31
Difficulty Level	28
Item Analysis	28
Foreign Countries	23
Mathematics Tests	23
Multiple Choice Tests	23
Reliability	22
Comparative Analysis	21
Testing	21
Test Bias	20
Test Format	18
Computer Assisted Testing	17
Correlation	17
Reading Tests	15
Higher Education	14
Measurement Techniques	14
Achievement Tests	13
More ▼

Education Level

Secondary Education	18
Elementary Education	14
Higher Education	14
Early Childhood Education	11
Postsecondary Education	10
High Schools	9
Elementary Secondary Education	8
Grade 4	8
Primary Education	8
Middle Schools	7
Intermediate Grades	6
Grade 3	5
Grade 5	5
Grade 6	5
Junior High Schools	5
Grade 7	4
Grade 8	4
Grade 1	2
Grade 2	2
Grade 9	2
Kindergarten	2
More ▼

Audience

Practitioners	3
Researchers	2
Teachers	2

Location

Florida	5
California	4
Nebraska	4
Canada	3
Germany	3
New Mexico	3
United Kingdom	3
Australia	2
France	2
Israel	2
Japan	2
Netherlands	2
South Korea	2
Turkey	2
United Kingdom (England)	2
Alabama	1
Austria	1
Belgium	1
Chile	1
China	1
Colorado	1
Cyprus	1
Czech Republic	1
Denmark	1
Estonia	1
More ▼

Laws, Policies, & Programs

Elementary and Secondary…	1
Individuals with Disabilities…	1

What Works Clearinghouse Rating

Meets WWC Standards without Reservations	1
Meets WWC Standards with or without Reservations	1

Showing 1 to 15 of 161 results Save | Export

Establishing a Physics Concept Inventory Using Computer Marked Free-Response Questions

Peer reviewed
PDF on ERIC

Download full text

Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…

Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability

Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?

Peer reviewed

Direct link

Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025

To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…

Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory

Item Response Theory Modeling of the Verb Naming Test

Peer reviewed

Direct link

Fergadiotis, Gerasimos; Casilio, Marianne; Dickey, Michael Walsh; Steel, Stacey; Nicholson, Hannele; Fleegle, Mikala; Swiderski, Alexander; Hula, William D. – Journal of Speech, Language, and Hearing Research, 2023

Purpose: Item response theory (IRT) is a modern psychometric framework with several advantageous properties as compared with classical test theory. IRT has been successfully used to model performance on anomia tests in individuals with aphasia; however, all efforts to date have focused on noun production accuracy. The purpose of this study is to…

Descriptors: Item Response Theory, Psychometrics, Verbs, Naming

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Differential Item Functioning Analysis of the Fundamental Concepts for Organic Reaction Mechanisms Inventory

Peer reviewed

Direct link

Sachin Nedungadi; Corina E. Brown; Sue Hyeon Paek – Journal of Chemical Education, 2022

The Fundamental Concepts for Organic Reaction Mechanisms Inventory (FC-ORMI) is a concept inventory with most items in a two-tier design in which an answer tier is followed by a reasoning tier. Statistical results provided strong evidence for the validity and reliability of the data obtained using the FC-ORMI. In this study, differential item…

Descriptors: Test Bias, Test Validity, Test Reliability, Gender Differences

Can High-Dimensional Questionnaires Resolve the Ipsativity Issue of Forced-Choice Response Formats?

Peer reviewed

Direct link

Schulte, Niklas; Holling, Heinz; Bürkner, Paul-Christian – Educational and Psychological Measurement, 2021

Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interindividual comparisons in high-stakes situations impossible. Several studies suggest that these problems vanish if the number of measured traits is high.…

Descriptors: Questionnaires, Measurement Techniques, Test Format, Scoring

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Coefficient [beta] as Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-Scored Tests

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Measurement in Education, 2021

KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…

Descriptors: Test Reliability, Scores, Scoring, Computation

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Detecting Rater Effects in Trend Scoring

Direct link

Abdalla, Widad – ProQuest LLC, 2019

Trend scoring is often used in large-scale assessments to monitor for rater drift when the same constructed response items are administered in multiple test administrations. In trend scoring, a set of responses from Time "A" are rescored by raters at Time "B." The purpose of this study is to examine the ability of…

Descriptors: Scoring, Interrater Reliability, Test Items, Error Patterns

Development of a Protein Concept Inventory: A Proposal for Item Scoring and Responding

Peer reviewed
PDF on ERIC

Download full text

Güntay Tasçi – Science Insights Education Frontiers, 2024

The present study has aimed to develop and validate a protein concept inventory (PCI) consisting of 25 multiple-choice (MC) questions to assess students' understanding of protein, which is a fundamental concept across different biology disciplines. The development process of the PCI involved a literature review to identify protein-related content,…

Descriptors: Science Instruction, Science Tests, Multiple Choice Tests, Biology

A Mokken Scale Analysis of the Last Series of the Standard Progressive Matrices (SPM-LS)

Peer reviewed
PDF on ERIC

Download full text

Myszkowski, Nils – Journal of Intelligence, 2020

Raven's Standard Progressive Matrices (Raven 1941) is a widely used 60-item long measure of general mental ability. It was recently suggested that, for situations where taking this test is too time consuming, a shorter version, comprised of only the last series of the Standard Progressive Matrices (Myszkowski and Storme 2018) could be used, while…

Descriptors: Intelligence Tests, Psychometrics, Nonparametric Statistics, Item Response Theory

Accounting for Rater Effects with the Hierarchical Rater Model Framework When Scoring Simple Structured Constructed Response Tests

Peer reviewed

Direct link

Nieto, Ricardo; Casabianca, Jodi M. – Journal of Educational Measurement, 2019

Many large-scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple-choice and/or constructed responses sections of items to generate multiple…

Descriptors: Tests, Scoring, Responses, Test Items

Using Existing Data to Inform Development of New Item Types. Research Report. ETS RR-20-01

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Ling, Guangming; Frankel, Lois – ETS Research Report Series, 2020

With advances in technology, researchers and test developers are developing new item types to measure complex skills like problem solving and critical thinking. Analyzing such items is often challenging because of their complicated response patterns, and thus it is important to develop psychometric methods for practitioners and researchers to…

Descriptors: Test Construction, Test Items, Item Analysis, Psychometrics

Scoring Stability in a Large-Scale Assessment Program: A Longitudinal Analysis of Leniency/Severity Effects

Peer reviewed

Direct link

Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019

Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…

Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11

Journal of Psychoeducational…	8
Applied Measurement in…	7
ETS Research Report Series	6
Educational and Psychological…	6
Grantee Submission	6
Applied Psychological…	5
Journal of Educational…	5
Online Submission	5
ProQuest LLC	4
Educational Measurement:…	3
International Journal of…	3
Journal of Educational and…	3
Nebraska Department of…	3
OECD Publishing	3
American Journal of…	2
Eurasian Journal of…	2
Language Testing	2
New Meridian Corporation	2
New Mexico Public Education…	2
ACT, Inc.	1
Advances in Health Sciences…	1
American Language Review	1
Asia Pacific Education Review	1
Assessment	1
Assessment & Evaluation in…	1
More ▼

Schoen, Robert C.	7
Yang, Xiaotong	4
Anderson, Daniel	3
Bauduin, Charity	3
Paek, Insu	3
Stansfield, Charles W.	3
Braun, Henry I.	2
Brennan, Robert L.	2
Burton, Richard F.	2
Dorans, Neil J.	2
Downey, Ronald G.	2
Fergadiotis, Gerasimos	2
Friedman, Greg	2
Guo, Hongwen	2
Haladyna, Thomas M.	2
Hula, William D.	2
Lee, Won-Chan	2
Liu, Sicong	2
McGinty, Dixie	2
Michaels, Hillary	2
Neel, John H.	2
Ochieng, Charles	2
Segall, Daniel O.	2
Slepkov, Aaron D.	2
More ▼

Journal Articles	88
Reports - Research	79
Reports - Evaluative	46
Speeches/Meeting Papers	25
Reports - Descriptive	19
Tests/Questionnaires	12
Numerical/Quantitative Data	9
Dissertations/Theses -…	4
Guides - Non-Classroom	4
Collected Works - General	2
Guides - Classroom - Teacher	2
Information Analyses	2
Opinion Papers	2
Books	1
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - General	1
Reference Materials -…	1
Reports - General	1
More ▼

SAT (College Admission Test)	5
National Assessment of…	4
ACT Assessment	3
Advanced Placement…	2
Program for International…	2
Raven Progressive Matrices	2
Trends in International…	2
ACT Interest Inventory	1
Alberta Grade Twelve Diploma…	1
Autism Diagnostic Observation…	1
Clinical Evaluation of…	1
Computer Attitude Scale	1
Cornell Critical Thinking Test	1
Early Childhood Environment…	1
Graduate Management Admission…	1
Graduate Record Examinations	1
International Association for…	1
International English…	1
Kaufman Test of Educational…	1
Preliminary Scholastic…	1
Progress in International…	1
Strengths and Difficulties…	1
Teaching and Learning…	1
Test of English as a Foreign…	1
Test of Nonverbal Intelligence	1
More ▼