ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	34

Descriptor

Comparative Analysis	63
Reliability	31
Test Reliability	27
Test Validity	18
Evaluation Methods	16
Foreign Countries	12
Higher Education	10
Test Construction	9
Interrater Reliability	8
Student Evaluation	8
Validity	8
Computer Assisted Testing	7
Models	7
Test Items	7
Elementary Secondary Education	6
English (Second Language)	6
Item Response Theory	6
Language Proficiency	6
Language Tests	6
Performance Based Assessment	6
Scores	6
Computer Software	5
Academic Libraries	4
College Students	4
Data Collection	4
More ▼

Publication Type

Reports - Descriptive	63
Journal Articles	48
Speeches/Meeting Papers	5
Opinion Papers	4
Information Analyses	2
Collected Works - Serials	1
Guides - Non-Classroom	1
Tests/Questionnaires	1

Education Level

Higher Education	9
Elementary Secondary Education	7
Postsecondary Education	5
Secondary Education	4
Elementary Education	3
High Schools	3
Adult Education	2
Grade 4	1
Grade 5	1
Kindergarten	1
Middle Schools	1
More ▼

Audience

Media Staff

Location

Australia	4
United Kingdom (England)	4
Rhode Island	3
Connecticut	2
New Hampshire	2
New York	2
Vermont	2
Canada	1
Cyprus	1
Finland	1
Florida	1
Hong Kong	1
Japan	1
Malaysia	1
Singapore	1
Tennessee	1
United Kingdom	1
United States	1
More ▼

Laws, Policies, & Programs

Every Student Succeeds Act…	2
No Child Left Behind Act 2001	1

Assessments and Surveys

National Assessment of…	2
New York State Regents…	2
Program for International…	1
Sixteen Personality Factor…	1
Strong Campbell Interest…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 63 results Save | Export

Comparative Judgement in Education Research

Peer reviewed

Direct link

Ian Jones; Ben Davies – International Journal of Research & Method in Education, 2024

Educational researchers often need to construct precise and reliable measurement scales of complex and varied representations such as participants' written work, videoed lesson segments and policy documents. Developing such scales using can be resource-intensive and time-consuming, and the outcomes are not always reliable. Here we present…

Descriptors: Educational Research, Comparative Analysis, Educational Researchers, Measurement

Frequentist and Bayesian Factorial Invariance Using R

Peer reviewed
PDF on ERIC

Download full text

Teck Kiang Tan – Practical Assessment, Research & Evaluation, 2024

The procedures of carrying out factorial invariance to validate a construct were well developed to ensure the reliability of the construct that can be used across groups for comparison and analysis, yet mainly restricted to the frequentist approach. This motivates an update to incorporate the growing Bayesian approach for carrying out the Bayesian…

Descriptors: Bayesian Statistics, Factor Analysis, Programming Languages, Reliability

Curating Cyberbullying Datasets: A Human-AI Collaborative Approach

Peer reviewed

Direct link

Christopher E. Gomez; Marcelo O. Sztainberg; Rachel E. Trana – International Journal of Bullying Prevention, 2022

Cyberbullying is the use of digital communication tools and spaces to inflict physical, mental, or emotional distress. This serious form of aggression is frequently targeted at, but not limited to, vulnerable populations. A common problem when creating machine learning models to identify cyberbullying is the availability of accurately annotated,…

Descriptors: Video Technology, Computer Software, Computer Mediated Communication, Bullying

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

Automated Generation of Node-splitting Models for Assessment of Inconsistency in Network Meta-analysis

Peer reviewed

Direct link

van Valkenhoef, Gert; Dias, Sofia; Ades, A. E.; Welton, Nicky J. – Research Synthesis Methods, 2016

Network meta-analysis enables the simultaneous synthesis of a network of clinical trials comparing any number of treatments. Potential inconsistencies between estimates of relative treatment effects are an important concern, and several methods to detect inconsistency have been proposed. This paper is concerned with the node-splitting approach,…

Descriptors: Networks, Meta Analysis, Automation, Models

Item Response Theory for Peer Assessment

Peer reviewed

Direct link

Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016

As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation

Theoretical Model and Quantitative Assessment of Scientific Thinking and Reasoning

Peer reviewed

Direct link

Bao, Lei; Koenig, Kathleen; Xiao, Yang; Fritchman, Joseph; Zhou, Shaona; Chen, Cheng – Physical Review Physics Education Research, 2022

Abilities in scientific thinking and reasoning have been emphasized as core areas of initiatives, such as the Next Generation Science Standards or the College Board Standards for College Success in Science, which focus on the skills the future will demand of today's students. Although there is rich literature on studies of how these abilities…

Descriptors: Physics, Science Instruction, Teaching Methods, Thinking Skills

Specifications Grading in Political Science

Peer reviewed

Direct link

Blackstone, Bethany; Oldmixon, Elizabeth – Journal of Political Science Education, 2019

This article explores the efficacy of specifications grading in undergraduate political science classes. Specifications grading organizes instruction around a set of learning objectives and evaluates student success based on the achievement of carefully articulated specifications for each assessment. Assessments are considered satisfactory or…

Descriptors: Grading, Undergraduate Students, Political Science, Best Practices

Stepping Outside the Normed Sample: Implications for Validity

Peer reviewed

Direct link

Hays, Danica G.; Wood, Chris – Measurement and Evaluation in Counseling and Development, 2017

We present considerations for validity when a population outside of a normed sample is assessed and those data are interpreted. Using a career group counseling example exploring life satisfaction changes as evidenced by the Quality of Life Inventory (Frisch, 1994), we showcase qualitative and quantitative approaches to explore how normative data…

Descriptors: Data Interpretation, Scores, Quality of Life, Life Satisfaction

Equating a Large-Scale Writing Assessment Using Pairwise Comparisons of Performances

Peer reviewed

Direct link

Humphry, Stephen M.; McGrane, Joshua A. – Australian Educational Researcher, 2015

This paper presents a method for equating writing assessments using pairwise comparisons which does not depend upon conventional common-person or common-item equating designs. Pairwise comparisons have been successfully applied in the assessment of open-ended tasks in English and other areas such as visual art and philosophy. In this paper,…

Descriptors: Writing Evaluation, Evaluation Methods, Comparative Analysis, Writing Tests

The Reliability and Precision of Total Scores and IRT Estimates as a Function of Polytomous IRT Parameters and Latent Trait Distribution

Peer reviewed

Direct link

Culpepper, Steven Andrew – Applied Psychological Measurement, 2013

A classic topic in the fields of psychometrics and measurement has been the impact of the number of scale categories on test score reliability. This study builds on previous research by further articulating the relationship between item response theory (IRT) and classical test theory (CTT). Equations are presented for comparing the reliability and…

Descriptors: Item Response Theory, Reliability, Scores, Error of Measurement

Computer-Adaptive Assessments: Fundamentals and Considerations

Direct link

Mitchell, Alison M.; Truckenmiller, Adrea; Petscher, Yaacov – Communique, 2015

As part of the Race to the Top initiative, the United States Department of Education made nearly 1 billion dollars available in State Educational Technology grants with the goal of ramping up school technology. One result of this effort is that states, districts, and schools across the country are using computerized assessments to measure their…

Descriptors: Computer Assisted Testing, Educational Technology, Testing, Efficiency

Comparative Judgement for Assessment

Peer reviewed

Direct link

Pollitt, Alastair – International Journal of Technology and Design Education, 2012

Historically speaking, students were judged long before they were marked. The tradition of marking, or scoring, pieces of work students offer for assessment is little more than two centuries old, and was introduced mainly to cope with specific problems arising from the growth in the numbers graduating from universities as the industrial revolution…

Descriptors: Holistic Evaluation, Educational Assessment, Evaluation Methods, Educational History

Developing and Measuring Higher Order Skills: Models for State Performance Assessment Systems. Research Brief

Peer reviewed
PDF on ERIC

Download full text

Darling-Hammond, Linda – Learning Policy Institute, 2017

After passage of the Every Student Succeeds Act (ESSA) in 2015, states assumed greater responsibility for designing their own accountability and assessment systems. ESSA requires states to measure "higher order thinking skills and understanding" and encourages the use of open-ended performance assessments, which are essential for…

Descriptors: Performance Based Assessment, Accountability, Portfolios (Background Materials), Task Analysis

Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory

Peer reviewed

Direct link

Lee, Won-Chan – Journal of Educational Measurement, 2010

In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…

Descriptors: Classification, Item Response Theory, Comparative Analysis, Models

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Educational Researcher	2
International Journal of…	2
Journal of Chemical Education	2
Measurement and Evaluation in…	2
Performance and Instruction	2
American Journal of Evaluation	1
Applied Psychological…	1
Assessment & Evaluation in…	1
Australian Educational…	1
Center for Research on…	1
Communique	1
Council of Chief State School…	1
ELT Journal	1
Education Digest: Essential…	1
Educational Measurement:…	1
Focus	1
Higher Education in Europe	1
Human Resource Development…	1
IEEE Transactions on Learning…	1
International Education…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Internet Research	1
Journal of Curriculum Studies	1
More ▼

Bauer, Christopher F.	2
Darling-Hammond, Linda	2
Lee, Won-Chan	2
Shrock, Sharon	2
Ades, A. E.	1
Arnold, Karl-Heinz	1
Arth, Thomas O.	1
Azzam, Tarek	1
Bao, Lei	1
Behizadeh, Nadia	1
Ben Davies	1
Benderson, Albert, Ed.	1
Blackstone, Bethany	1
Boyd, Marcia A.	1
Bradbury, Alice	1
Brandon, Paul R.	1
Brennan, Patricia B. M.	1
Brown, James Dean	1
Burkhardt, Joanna	1
Charalambous, Charalambos Y.	1
Chen, Cheng	1
Christie, Christina A.	1
Christopher E. Gomez	1
Collier, Chris	1
More ▼