Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 14 |
Descriptor
Statistical Analysis | 38 |
Test Reliability | 38 |
Test Theory | 38 |
Test Validity | 14 |
Career Development | 10 |
Comparative Analysis | 8 |
Correlation | 8 |
Mathematical Models | 8 |
Criterion Referenced Tests | 7 |
Scores | 7 |
Test Construction | 6 |
More ▼ |
Source
Author
Bormuth, John R. | 2 |
Algina, James | 1 |
Belfry, M. Joan | 1 |
Bernholt, S. | 1 |
Brown, James Dean | 1 |
Budescu, David | 1 |
Bush, Martin E. | 1 |
Cahan, Sorel | 1 |
Calmettes, Guillaume | 1 |
Cantwell, Emily D. | 1 |
Cohen, Allan S., Comp. | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 7 |
Postsecondary Education | 5 |
Elementary Education | 1 |
Middle Schools | 1 |
Audience
Practitioners | 1 |
Students | 1 |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
California Achievement Tests | 1 |
Defining Issues Test | 1 |
Strengths and Difficulties… | 1 |
Test of English as a Foreign… | 1 |
Woodcock Johnson Tests of… | 1 |
What Works Clearinghouse Rating
Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items
Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020
The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…
Descriptors: Test Bias, Interrater Reliability, Responses, Correlation
Zumbo, Bruno D.; Kroc, Edward – Educational and Psychological Measurement, 2019
Chalmers recently published a critique of the use of ordinal a[alpha] proposed in Zumbo et al. as a measure of test reliability in certain research settings. In this response, we take up the task of refuting Chalmers' critique. We identify three broad misconceptions that characterize Chalmers' criticisms: (1) confusing assumptions with…
Descriptors: Test Reliability, Statistical Analysis, Misconceptions, Mathematical Models
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing
Longabach, Tanya; Peyton, Vicki – Language Testing, 2018
K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…
Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency
Wilcox, Bethany R.; Lewandowski, H. J. – Physical Review Physics Education Research, 2016
Student learning in instructional physics labs represents a growing area of research that includes investigations of students' beliefs and expectations about the nature of experimental physics. To directly probe students' epistemologies about experimental physics and support broader lab transformation efforts at the University of Colorado Boulder…
Descriptors: Physics, Epistemology, Surveys, Science Instruction
Retnawati, Heri – Turkish Online Journal of Educational Technology - TOJET, 2015
This study aimed to compare the accuracy of the test scores as results of Test of English Proficiency (TOEP) based on paper and pencil test (PPT) versus computer-based test (CBT). Using the participants' responses to the PPT documented from 2008-2010 and data of CBT TOEP documented in 2013-2014 on the sets of 1A, 2A, and 3A for the Listening and…
Descriptors: Scores, Accuracy, Computer Assisted Testing, English (Second Language)
Lane, Kathleen Lynne; Oakes, Wendy Peia; Cantwell, Emily D.; Menzies, Holly Mariah; Schatschneider, Christopher; Lambert, Warren; Common, Eric Alan – Journal of Emotional and Behavioral Disorders, 2017
We report results of an exploratory validation study of the "Student Risk Screening Scale-Internalizing and Externalizing" (SRSS-IE) applied with the first sample of middle and high school students from nine middle and three high schools from three states. The "Student Risk Screening Scale" (SRSS) was modified to broaden the…
Descriptors: Scores, Psychometrics, Evidence, Middle Schools
Taskin, V.; Bernholt, S.; Parchmann, I. – Chemistry Education Research and Practice, 2015
Chemical representations play an important role in helping learners to understand chemical contents. Thus, dealing with chemical representations is a necessity for learning chemistry, but at the same time, it presents a great challenge to learners. Due to this great challenge, it is not surprising that numerous national and international studies…
Descriptors: Student Teachers, Knowledge Level, Science Instruction, Chemistry
Calmettes, Guillaume; Drummond, Gordon B.; Vowler, Sarah L. – Advances in Physiology Education, 2012
A jack knife is a pocket knife that is put to many tasks, because it's ready to hand. Often there could be a better tool for the job, such as a screwdriver, a scraper, or a can-opener, but these are not usually pocket items. In statistical terms, the expression implies making do with what's available. Another simile, of an extreme situation, is…
Descriptors: Statistical Analysis, Computation, Population Distribution, Evaluation Methods
Keller, Christopher M.; Kros, John F. – Marketing Education Review, 2011
Measures of survey reliability are commonly addressed in marketing courses. One statistic of reliability is "Cronbach's alpha." This paper presents an application of survey reliability as a reflexive application of multiple-choice exam validation. The application provides an interactive decision support system that incorporates survey item…
Descriptors: Test Validity, Marketing, Test Reliability, Multiple Choice Tests
Herman, Geoffrey Lindsay – ProQuest LLC, 2011
Instructors in electrical and computer engineering and in computer science have developed innovative methods to teach digital logic circuits. These methods attempt to increase student learning, satisfaction, and retention. Although there are readily accessible and accepted means for measuring satisfaction and retention, there are no widely…
Descriptors: Grounded Theory, Delphi Technique, Concept Formation, Misconceptions
Sinharay, Sandip; Haberman, Shelby; Puhan, Gautam – Educational Measurement: Issues and Practice, 2007
There is an increasing interest in reporting subscores, both at examinee level and at aggregate levels. However, it is important to ensure reasonable subscore performance in terms of high reliability and validity to minimize incorrect instructional and remediation decisions. This article employs a statistical measure based on classical test theory…
Descriptors: Test Reliability, Test Theory, Test Validity, Statistical Analysis
Bush, Martin E. – Quality Assurance in Education: An International Perspective, 2006
Purpose: To provide educationalists with an understanding of the key quality issues relating to multiple-choice tests, and a set of guidelines for the quality assurance of such tests. Design/methodology/approach: The discussion of quality issues is structured to reflect the order in which those issues naturally arise. It covers the design of…
Descriptors: Multiple Choice Tests, Test Reliability, Educational Quality, Quality Control
Dawson, Thomas E. – 1997
The basic processes in univariate statistics involve partitioning the sum of squares into two components: explained and within. This paper explains that the same partitioning occurs in measurement analyses, i.e., splitting the sum of squares into reliable and unreliable components. In addition, it is shown how the three types of error inherent in…
Descriptors: Estimation (Mathematics), Measurement Techniques, Scores, Statistical Analysis
Brown, James Dean; Ross, Jacqueline A. – 1993
This study investigates the Test of English as a Foreign Language (TOEFL), in particular the relative contributions to score dependability (analogous to classical theory reliability) of various numbers of items and subtests as well as the decision dependability at different cut points. Research questions that apply to the overall TOEFL battery and…
Descriptors: English (Second Language), Language Tests, Statistical Analysis, Test Reliability