Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 11 |
Descriptor
Accuracy | 11 |
Test Theory | 11 |
Test Reliability | 7 |
Item Response Theory | 6 |
Comparative Analysis | 3 |
Correlation | 3 |
Goodness of Fit | 3 |
Language Tests | 3 |
Reliability | 3 |
Scores | 3 |
Scoring | 3 |
More ▼ |
Source
Author
Haberman, Shelby J. | 2 |
An, Ji | 1 |
Briggs, Laura C. | 1 |
Coggins, Joanne V. | 1 |
Deng, Nina | 1 |
Hancock, Gregory R. | 1 |
Hau, Kit-Tai | 1 |
Hayes, Malcolm | 1 |
He, Qingping | 1 |
Kim, Jwa K. | 1 |
Kim, Sooyeon | 1 |
More ▼ |
Publication Type
Journal Articles | 8 |
Reports - Research | 7 |
Reports - Evaluative | 2 |
Dissertations/Theses -… | 1 |
Reports - Descriptive | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 2 |
Elementary Education | 1 |
Postsecondary Education | 1 |
Audience
Location
Indonesia | 1 |
United Kingdom (England) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 1 |
Gates MacGinitie Reading Tests | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Xiao, Leifeng; Hau, Kit-Tai – Applied Measurement in Education, 2023
We compared coefficient alpha with five alternatives (omega total, omega RT, omega h, GLB, and coefficient H) in two simulation studies. Results showed for unidimensional scales, (a) all indices except omega h performed similarly well for most conditions; (b) alpha is still good; (c) GLB and coefficient H overestimated reliability with small…
Descriptors: Test Theory, Test Reliability, Factor Analysis, Test Length
Hancock, Gregory R.; An, Ji – Measurement: Interdisciplinary Research and Perspectives, 2020
As an alternative to Cronbach's [alpha] for estimating scale reliability, McDonald's [omega] has attracted increased attention within the methodological community for its less stringent measurement assumptions. Notwithstanding, [omega] is still seldom used by practitioners, likely due to its unavailability in popular software packages (e.g., SPSS)…
Descriptors: Evaluation, Alternative Assessment, Reliability, Test Reliability
Stephen L. Wright; Michael A. Jenkins-Guarnieri – Journal of Psychoeducational Assessment, 2024
The current study sought out to advance the Social Self-Efficacy and Social Outcome Expectations scale using multiple approaches to scale development. Data from 583 undergraduate students were used in two scale development approaches: Classic Test Theory (CTT) and Item Response Theory (IRT). Confirmatory factor analysis suggested a 2-factor…
Descriptors: Measures (Individuals), Expectation, Self Efficacy, Item Response Theory
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing
Coggins, Joanne V.; Kim, Jwa K.; Briggs, Laura C. – Research in the Schools, 2017
The Gates-MacGinitie Reading Comprehension Test, fourth edition (GMRT-4) and the ACT Reading Tests (ACT-R) were administered to 423 high school students in order to explore the similarities and dissimilarities of data produced through classical test theory (CTT) and item response theory (IRT) analysis. Despite the many advantages of IRT…
Descriptors: Item Response Theory, Test Theory, Reading Comprehension, Reading Tests
Retnawati, Heri – Turkish Online Journal of Educational Technology - TOJET, 2015
This study aimed to compare the accuracy of the test scores as results of Test of English Proficiency (TOEP) based on paper and pencil test (PPT) versus computer-based test (CBT). Using the participants' responses to the PPT documented from 2008-2010 and data of CBT TOEP documented in 2013-2014 on the sets of 1A, 2A, and 3A for the Listening and…
Descriptors: Scores, Accuracy, Computer Assisted Testing, English (Second Language)
Zimmerman, Donald W. – Journal of Educational and Behavioral Statistics, 2011
Many well-known equations in classical test theory are mathematical identities in populations of individuals but not in random samples from those populations. First, test scores are subject to the same sampling error that is familiar in statistical estimation and hypothesis testing. Second, the assumptions made in derivation of formulas in test…
Descriptors: Test Theory, Equations (Mathematics), Scores, Sampling
He, Qingping; Hayes, Malcolm; Wiliam, Dylan – Research Papers in Education, 2013
The accuracy of the results of the national tests in English, mathematics and science taken by 11-year olds in England has been a matter of much debate since their introduction in 1994, with estimates of the proportion of students incorrectly classified varying from 10 to 30%. Using live data from the 2009 and 2010 administration of the national…
Descriptors: Foreign Countries, National Curriculum, Accuracy, Classification
Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011
Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…
Descriptors: True Scores, Test Theory, Prediction, Group Membership
Haberman, Shelby J. – Educational Testing Service, 2011
Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…
Descriptors: Writing Tests, Scoring, Essays, Language Tests
Deng, Nina – ProQuest LLC, 2011
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…
Descriptors: Item Response Theory, Test Theory, Computation, Classification