NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Koziol, Natalie A.; Goodrich, J. Marc; Yoon, HyeonJin – Educational and Psychological Measurement, 2022
Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A…
Descriptors: Regression (Statistics), Item Analysis, Validity, Testing Accommodations
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Jihyun; Paek, Insu – Journal of Psychoeducational Assessment, 2014
Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling…
Descriptors: Likert Scales, Responses, Item Response Theory, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Aydin, Selami – Education 3-13, 2012
Studies conducted so far have mainly focused on the relationships between perceptions of tests and test anxiety among adult foreign language learners, while there is a lack of research focusing on young learners on the above-mentioned issue. Thus, this study aims to examine the relationship between test anxiety among young learners who study…
Descriptors: Test Length, Content Validity, Validity, Measures (Individuals)
Peer reviewed Peer reviewed
Direct linkDirect link
Lewandowski, Lawrence; Cohen, Justin; Lovett, Benjamin J. – Journal of Psychoeducational Assessment, 2013
Students with disabilities often receive test accommodations in schools and on high-stakes tests. Students with learning disabilities (LD) represent the largest disability group in schools, and extended time the most common test accommodation requested by such students. This pairing persists despite controversy over the validity of extended time…
Descriptors: Testing Accommodations, Learning Disabilities, Reading Comprehension, Undergraduate Students
Kim, Jihye – ProQuest LLC, 2010
In DIF studies, a Type I error refers to the mistake of identifying non-DIF items as DIF items, and a Type I error rate refers to the proportion of Type I errors in a simulation study. The possibility of making a Type I error in DIF studies is always present and high possibility of making such an error can weaken the validity of the assessment.…
Descriptors: Test Bias, Test Length, Simulation, Testing
Evans, Josiah Jeremiah – ProQuest LLC, 2010
In measurement research, data simulations are a commonly used analytical technique. While simulation designs have many benefits, it is unclear if these artificially generated datasets are able to accurately capture real examinee item response behaviors. This potential lack of comparability may have important implications for administration of…
Descriptors: Computer Assisted Testing, Adaptive Testing, Educational Testing, Admission (School)
Peer reviewed Peer reviewed
Direct linkDirect link
Woods, Carol M. – Applied Psychological Measurement, 2008
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
Descriptors: Test Length, Computation, Item Response Theory, Maximum Likelihood Statistics
PDF pending restoration PDF pending restoration
Neustel, Sandra – 2001
As a continuing part of its validity studies, the Association of American Medical Colleges commissioned a study of the speediness of the Medical College Admission Test (MCAT). If speed is a hidden part of the test, it is a threat to its construct validity. As a general rule, the criterion used to indicate lack of speediness is that 80% of the…
Descriptors: College Applicants, College Entrance Examinations, Higher Education, Medical Education
Peer reviewed Peer reviewed
Sher, Kenneth J.; And Others – Psychological Assessment, 1995
Interrelated analyses were conducted with more than 4,000 college students to examine the reliability and validity of the Tridimensional Personality Questionnaire (TPQ) and to develop and validate a short version of the scale. Results provide moderate support for the reliability and validity of both the TPQ and the short form. (SLD)
Descriptors: College Students, Factor Analysis, Higher Education, Personality Assessment
Peer reviewed Peer reviewed
Huynh, Huynh; Casteel, Jim – Journal of Experimental Education, 1987
In the context of pass/fail decisions, using the Bock multi-nominal latent trait model for moderate-length tests does not produce decisions that differ substantially from those based on the raw scores. The Bock decisions appear to relate less strongly to outside criteria than those based on the raw scores. (Author/JAZ)
Descriptors: Cutting Scores, Error Patterns, Grade 6, Intermediate Grades
Wingersky, Marilyn S.; Lord, Frederic M. – 1983
The sampling errors of maximum likelihood estimates of item-response theory parameters are studied in the case where both people and item parameters are estimated simultaneously. A check on the validity of the standard error formulas is carried out. The effect of varying sample size, test length, and the shape of the ability distribution is…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Banks, Latent Trait Theory
Steinheiser, Frederick H., Jr.; And Others – 1978
Alternative mathematical models for scoring and decision making with criterion referenced tests are described, especially as they concern appropriate test length and methods of establishing statistically valid cutting scores. Several of these approaches are reviewed and compared on formal-analytic and empirical grounds: (1) Block's approach to…
Descriptors: Comparative Analysis, Criterion Referenced Tests, Cutting Scores, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Jones, Brett D.; Egley, Robert J. – ERS Spectrum, 2005
The purpose of this paper is to discuss Florida teachers' recommendations for improving the Florida Comprehensive Assessment Test (FCAT) and to compare their recommendations with those of Florida administrators. Although teachers' suggestions varied as to the types and extent of remedies needed to improve the FCAT, some common themes emerged. The…
Descriptors: Test Results, Core Curriculum, Student Evaluation, Accountability