ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	14

Descriptor

Statistical Analysis	38
Test Reliability	38
Test Theory	38
Test Validity	14
Career Development	10
Comparative Analysis	8
Correlation	8
Mathematical Models	8
Criterion Referenced Tests	7
Scores	7
Test Construction	6
Test Interpretation	6
Error of Measurement	5
Item Response Theory	5
Test Items	5
True Scores	5
Equated Scores	4
Evaluation Methods	4
Factor Analysis	4
Item Analysis	4
Language Tests	4
Measurement Techniques	4
Norm Referenced Tests	4
Psychometrics	4
Scoring	4
More ▼

Source

Educational and Psychological…	4
Journal of Educational…	3
Advances in Physiology…	1
Alberta Journal of…	1
Applied Psychological…	1
Chemistry Education Research…	1
ETS Research Report Series	1
Educational Measurement:…	1
Journal of Emotional and…	1
Journal of Interactive Online…	1
Journal of School Psychology	1
Language Testing	1
Marketing Education Review	1
Physical Review Physics…	1
ProQuest LLC	1
Quality Assurance in…	1
System	1
Turkish Online Journal of…	1
More ▼

Publication Type

Reports - Research	24
Journal Articles	20
Reports - Descriptive	5
Speeches/Meeting Papers	4
Reports - Evaluative	2
Books	1
Collected Works - Serials	1
Dissertations/Theses -…	1
Guides - Classroom - Learner	1
Information Analyses	1
Opinion Papers	1
Reference Materials -…	1
More ▼

Education Level

Higher Education	7
Postsecondary Education	5
Elementary Education	1
Middle Schools	1

Audience

Practitioners	1
Students	1

Location

Canada	1
Colorado	1
Germany	1
Indonesia	1

Laws, Policies, & Programs

Elementary and Secondary…

Assessments and Surveys

California Achievement Tests	1
Defining Issues Test	1
Strengths and Difficulties…	1
Test of English as a Foreign…	1
Woodcock Johnson Tests of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 38 results Save | Export

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Peer reviewed

Direct link

Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…

Descriptors: Test Bias, Interrater Reliability, Responses, Correlation

A Measurement Is a Choice and Stevens' Scales of Measurement Do Not Help Make It: A Response to Chalmers

Peer reviewed

Direct link

Zumbo, Bruno D.; Kroc, Edward – Educational and Psychological Measurement, 2019

Chalmers recently published a critique of the use of ordinal a[alpha] proposed in Zumbo et al. as a measure of test reliability in certain research settings. In this response, we take up the task of refuting Chalmers' critique. We identify three broad misconceptions that characterize Chalmers' criticisms: (1) confusing assumptions with…

Descriptors: Test Reliability, Statistical Analysis, Misconceptions, Mathematical Models

Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…

Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing

A Comparison of Reliability and Precision of Subscore Reporting Methods for a State English Language Proficiency Assessment

Peer reviewed

Direct link

Longabach, Tanya; Peyton, Vicki – Language Testing, 2018

K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…

Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency

Students' Epistemologies about Experimental Physics: Validating the Colorado Learning Attitudes about Science Survey for Experimental Physics

Peer reviewed

Direct link

Wilcox, Bethany R.; Lewandowski, H. J. – Physical Review Physics Education Research, 2016

Student learning in instructional physics labs represents a growing area of research that includes investigations of students' beliefs and expectations about the nature of experimental physics. To directly probe students' epistemologies about experimental physics and support broader lab transformation efforts at the University of Colorado Boulder…

Descriptors: Physics, Epistemology, Surveys, Science Instruction

The Comparison of Accuracy Scores on the Paper and Pencil Testing vs. Computer-Based Testing

Peer reviewed
PDF on ERIC

Download full text

Retnawati, Heri – Turkish Online Journal of Educational Technology - TOJET, 2015

This study aimed to compare the accuracy of the test scores as results of Test of English Proficiency (TOEP) based on paper and pencil test (PPT) versus computer-based test (CBT). Using the participants' responses to the PPT documented from 2008-2010 and data of CBT TOEP documented in 2013-2014 on the sets of 1A, 2A, and 3A for the Listening and…

Descriptors: Scores, Accuracy, Computer Assisted Testing, English (Second Language)

Psychometric Evidence of SRSS-IE Scores in Middle and High Schools

Peer reviewed

Direct link

Lane, Kathleen Lynne; Oakes, Wendy Peia; Cantwell, Emily D.; Menzies, Holly Mariah; Schatschneider, Christopher; Lambert, Warren; Common, Eric Alan – Journal of Emotional and Behavioral Disorders, 2017

We report results of an exploratory validation study of the "Student Risk Screening Scale-Internalizing and Externalizing" (SRSS-IE) applied with the first sample of middle and high school students from nine middle and three high schools from three states. The "Student Risk Screening Scale" (SRSS) was modified to broaden the…

Descriptors: Scores, Psychometrics, Evidence, Middle Schools

An Inventory for Measuring Student Teachers' Knowledge of Chemical Representations: Design, Validation, and Psychometric Analysis

Peer reviewed

Direct link

Taskin, V.; Bernholt, S.; Parchmann, I. – Chemistry Education Research and Practice, 2015

Chemical representations play an important role in helping learners to understand chemical contents. Thus, dealing with chemical representations is a necessity for learning chemistry, but at the same time, it presents a great challenge to learners. Due to this great challenge, it is not surprising that numerous national and international studies…

Descriptors: Student Teachers, Knowledge Level, Science Instruction, Chemistry

Making Do with What We Have: Use Your Bootstraps

Peer reviewed

Direct link

Calmettes, Guillaume; Drummond, Gordon B.; Vowler, Sarah L. – Advances in Physiology Education, 2012

A jack knife is a pocket knife that is put to many tasks, because it's ready to hand. Often there could be a better tool for the job, such as a screwdriver, a scraper, or a can-opener, but these are not usually pocket items. In statistical terms, the expression implies making do with what's available. Another simile, of an extreme situation, is…

Descriptors: Statistical Analysis, Computation, Population Distribution, Evaluation Methods

An Innovative Excel Application to Improve Exam Reliability in Marketing Courses

Peer reviewed

Direct link

Keller, Christopher M.; Kros, John F. – Marketing Education Review, 2011

Measures of survey reliability are commonly addressed in marketing courses. One statistic of reliability is "Cronbach's alpha." This paper presents an application of survey reliability as a reflexive application of multiple-choice exam validation. The application provides an interactive decision support system that incorporates survey item…

Descriptors: Test Validity, Marketing, Test Reliability, Multiple Choice Tests

The Development of a Digital Logic Concept Inventory

Direct link

Herman, Geoffrey Lindsay – ProQuest LLC, 2011

Instructors in electrical and computer engineering and in computer science have developed innovative methods to teach digital logic circuits. These methods attempt to increase student learning, satisfaction, and retention. Although there are readily accessible and accepted means for measuring satisfaction and retention, there are no widely…

Descriptors: Grounded Theory, Delphi Technique, Concept Formation, Misconceptions

Subscores Based on Classical Test Theory: To Report or Not to Report

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby; Puhan, Gautam – Educational Measurement: Issues and Practice, 2007

There is an increasing interest in reporting subscores, both at examinee level and at aggregate levels. However, it is important to ensure reasonable subscore performance in terms of high reliability and validity to minimize incorrect instructional and remediation decisions. This article employs a statistical measure based on classical test theory…

Descriptors: Test Reliability, Test Theory, Test Validity, Statistical Analysis

Quality Assurance of Multiple-Choice Tests

Peer reviewed

Direct link

Bush, Martin E. – Quality Assurance in Education: An International Perspective, 2006

Purpose: To provide educationalists with an understanding of the key quality issues relating to multiple-choice tests, and a set of guidelines for the quality assurance of such tests. Design/methodology/approach: The discussion of quality issues is structured to reflect the order in which those issues naturally arise. It covers the design of…

Descriptors: Multiple Choice Tests, Test Reliability, Educational Quality, Quality Control

Basic Concepts in Classical Test Theory: Relating Variance Partitioning in Substantive Analyses to the Same Process in Measurement Analyses.

Download full text

Dawson, Thomas E. – 1997

The basic processes in univariate statistics involve partitioning the sum of squares into two components: explained and within. This paper explains that the same partitioning occurs in measurement analyses, i.e., splitting the sum of squares into reliable and unreliable components. In addition, it is shown how the three types of error inherent in…

Descriptors: Estimation (Mathematics), Measurement Techniques, Scores, Statistical Analysis

Decision Dependability of Subtests, Tests, and the Overall TOEFL Test Battery.

Download full text

Brown, James Dean; Ross, Jacqueline A. – 1993

This study investigates the Test of English as a Foreign Language (TOEFL), in particular the relative contributions to score dependability (analogous to classical theory reliability) of various numbers of items and subtests as well as the decision dependability at different cut points. Research questions that apply to the overall TOEFL battery and…

Descriptors: English (Second Language), Language Tests, Statistical Analysis, Test Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3

Bormuth, John R.	2
Algina, James	1
Belfry, M. Joan	1
Bernholt, S.	1
Brown, James Dean	1
Budescu, David	1
Bush, Martin E.	1
Cahan, Sorel	1
Calmettes, Guillaume	1
Cantwell, Emily D.	1
Cohen, Allan S., Comp.	1
Common, Eric Alan	1
Crocker, Linda	1
Davidson, Fred	1
Dawson, Thomas E.	1
Downing, Steven M.	1
Drummond, Gordon B.	1
Epstein, Kenneth I.	1
Göçer Sahin, Sakine	1
Haberman, Shelby	1
Haladyna, Tom	1
Herman, Geoffrey Lindsay	1
Huynh, Huynh	1
Iran-Nejad, Asghar	1
Keller, Christopher M.	1
More ▼