ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	19

Descriptor

Reliability	37
Test Length	37
Test Items	12
Classification	11
Scores	11
Item Response Theory	10
Accuracy	7
Sample Size	7
Test Construction	7
Error of Measurement	6
Bayesian Statistics	5
Computation	5
Sampling	5
Test Theory	5
Comparative Analysis	4
Correlation	4
Cutting Scores	4
Measurement	4
Simulation	4
Statistical Analysis	4
Test Format	4
Adaptive Testing	3
Bias	3
Computer Assisted Testing	3
Decision Making	3
More ▼

Source

Educational and Psychological…	6
Applied Measurement in…	4
Applied Psychological…	4
International Journal of…	2
Measurement:…	2
ProQuest LLC	2
ERS Spectrum	1
Educational Measurement:…	1
Journal of Educational…	1
Journal of Educational…	1
Journal of Educational and…	1
Journal of Mental Health…	1
Journal of Psychoeducational…	1
Journal of Technology,…	1
Perceptual and Motor Skills	1
Psychological Assessment	1
Psychological Methods	1
Research in the Schools	1
More ▼

Publication Type

Journal Articles	29
Reports - Research	21
Reports - Evaluative	9
Speeches/Meeting Papers	5
Reports - Descriptive	4
Dissertations/Theses -…	2
Opinion Papers	1

Education Level

High Schools	2
Secondary Education	2
Elementary Education	1
Grade 11	1
Grade 12	1
Middle Schools	1

Audience

Location

Florida	1
Michigan	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Florida Comprehensive…	1
Iowa Tests of Basic Skills	1

What Works Clearinghouse Rating

Showing 1 to 15 of 37 results Save | Export

A Simulation Study on the Performance of Different Reliability Estimation Methods

Peer reviewed

Direct link

Edwards, Ashley A.; Joyner, Keanan J.; Schatschneider, Christopher – Educational and Psychological Measurement, 2021

The accuracy of certain internal consistency estimators have been questioned in recent years. The present study tests the accuracy of six reliability estimators (Cronbach's alpha, omega, omega hierarchical, Revelle's omega, and greatest lower bound) in 140 simulated conditions of unidimensional continuous data with uncorrelated errors with varying…

Descriptors: Reliability, Computation, Accuracy, Sample Size

There Are Many Greater Lower Bounds than Cronbach's [alpha]: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023

A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…

Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation

Attribute-Level Item Selection Method for DCM-CAT

Peer reviewed

Direct link

Bao, Yu; Bradshaw, Laine – Measurement: Interdisciplinary Research and Perspectives, 2018

Diagnostic classification models (DCMs) can provide multidimensional diagnostic feedback about students' mastery levels of knowledge components or attributes. One advantage of using DCMs is the ability to accurately and reliably classify students into mastery levels with a relatively small number of items per attribute. Combining DCMs with…

Descriptors: Test Items, Selection, Adaptive Testing, Computer Assisted Testing

The Impact of Q-Matrix Designs on Diagnostic Classification Accuracy in the Presence of Attribute Hierarchies

Peer reviewed

Direct link

Liu, Ren; Huggins-Manley, Anne Corinne; Bradshaw, Laine – Educational and Psychological Measurement, 2017

There is an increasing demand for assessments that can provide more fine-grained information about examinees. In response to the demand, diagnostic measurement provides students with feedback on their strengths and weaknesses on specific skills by classifying them into mastery or nonmastery attribute categories. These attributes often form a…

Descriptors: Matrices, Classification, Accuracy, Diagnostic Tests

In Search of the Optimal Number of Response Categories in a Rating Scale

Peer reviewed

Direct link

Lee, Jihyun; Paek, Insu – Journal of Psychoeducational Assessment, 2014

Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling…

Descriptors: Likert Scales, Responses, Item Response Theory, Classification

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Improving Measurement Precision of Hierarchical Latent Traits Using Adaptive Testing

Peer reviewed

Direct link

Wang, Chun – Journal of Educational and Behavioral Statistics, 2014

Many latent traits in social sciences display a hierarchical structure, such as intelligence, cognitive ability, or personality. Usually a second-order factor is linearly related to a group of first-order factors (also called domain abilities in cognitive ability measures), and the first-order factors directly govern the actual item responses.…

Descriptors: Measurement, Accuracy, Item Response Theory, Adaptive Testing

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

Bi-Factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification

Direct link

Md Desa, Zairul Nor Deana – ProQuest LLC, 2012

In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…

Descriptors: Item Response Theory, Computation, Reliability, Classification

Evaluating EIV, OLS, and SEM Estimators of Group Slope Differences in the Presence of Measurement Error: The Single-Indicator Case

Peer reviewed

Direct link

Culpepper, Steven Andrew – Applied Psychological Measurement, 2012

Measurement error significantly biases interaction effects and distorts researchers' inferences regarding interactive hypotheses. This article focuses on the single-indicator case and shows how to accurately estimate group slope differences by disattenuating interaction effects with errors-in-variables (EIV) regression. New analytic findings were…

Descriptors: Evidence, Test Length, Interaction, Regression (Statistics)

An Evaluation of Item Response Theory Classification Accuracy and Consistency Indices

Peer reviewed

Direct link

Wyse, Adam E.; Hao, Shiqi – Applied Psychological Measurement, 2012

This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…

Descriptors: Item Response Theory, Classification, Accuracy, Reliability

Psychometric Properties of IRT Proficiency Estimates

Peer reviewed

Direct link

Kolen, Michael J.; Tong, Ye – Educational Measurement: Issues and Practice, 2010

Psychometric properties of item response theory proficiency estimates are considered in this paper. Proficiency estimators based on summed scores and pattern scores include non-Bayes maximum likelihood and test characteristic curve estimators and Bayesian estimators. The psychometric properties investigated include reliability, conditional…

Descriptors: Test Length, Psychometrics, Item Response Theory, Scores

The Utility and Psychometric Properties of the Abel-Blasingame Assessment System for "Individuals with Intellectual Disabilities"

Peer reviewed

Direct link

Blasingame, Gerry D.; Abel, Gene G.; Jordan, Alan; Wiegel, Markus – Journal of Mental Health Research in Intellectual Disabilities, 2011

This article describes the development and utility of the Abel-Blasingame Assessment System for "individuals with intellectual disabilities" (ABID) for assessment of sexual interest and problematic sexual behaviors. The study examined the preliminary psychometric properties and evaluated the clinical utility of the ABID based on a sample…

Descriptors: Mental Retardation, Developmental Delays, Measures (Individuals), Questionnaires

Evaluating IRT- and CTT-Based Methods of Estimating Classification Consistency and Accuracy Indices from Single Administrations

Direct link

Deng, Nina – ProQuest LLC, 2011

Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…

Descriptors: Item Response Theory, Test Theory, Computation, Classification

Correcting Fallacies in Validity, Reliability, and Classification

Peer reviewed

Direct link

Sijtsma, Klaas – International Journal of Testing, 2009

This article reviews three topics from test theory that continue to raise discussion and controversy and capture test theorists' and constructors' interest. The first topic concerns the discussion of the methodology of investigating and establishing construct validity; the second topic concerns reliability and its misuse, alternative definitions…

Descriptors: Construct Validity, Reliability, Classification, Test Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3

Sijtsma, Klaas	3
Bradshaw, Laine	2
Emons, Wilco H. M.	2
Feldt, Leonard S.	2
Yen, Wendy M.	2
Abel, Gene G.	1
Alsawalmeh, Yousef M.	1
Bao, Yu	1
Blasingame, Gerry D.	1
Candell, Gregory L.	1
Cheng, Ying-Yao	1
Clements, Andrea D.	1
Culpepper, Steven Andrew	1
Deng, Nina	1
Dietz, Thomas	1
Edwards, Ashley A.	1
Egley, Robert J.	1
Fitzpatrick, Anne R.	1
Glas, Cees A. W.	1
Guagnano, Gregory A.	1
Haladyna, Tom	1
Hao, Shiqi	1
Henson, Robin K.	1
Ho, Yi-Hui	1
More ▼