Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 7 |
Descriptor
Test Length | 13 |
Validity | 13 |
Comparative Analysis | 4 |
Reliability | 3 |
Sample Size | 3 |
Simulation | 3 |
Cutting Scores | 2 |
Educational Testing | 2 |
Error Patterns | 2 |
Error of Measurement | 2 |
Evaluation Methods | 2 |
More ▼ |
Source
Journal of Psychoeducational… | 2 |
ProQuest LLC | 2 |
Applied Psychological… | 1 |
ERS Spectrum | 1 |
Education 3-13 | 1 |
Educational and Psychological… | 1 |
Journal of Experimental… | 1 |
Psychological Assessment | 1 |
Author
Aydin, Selami | 1 |
Casteel, Jim | 1 |
Cohen, Justin | 1 |
Egley, Robert J. | 1 |
Evans, Josiah Jeremiah | 1 |
Goodrich, J. Marc | 1 |
Huynh, Huynh | 1 |
Jones, Brett D. | 1 |
Kim, Jihye | 1 |
Koziol, Natalie A. | 1 |
Lee, Jihyun | 1 |
More ▼ |
Publication Type
Journal Articles | 8 |
Reports - Research | 7 |
Dissertations/Theses -… | 2 |
Reports - Evaluative | 2 |
Information Analyses | 1 |
Numerical/Quantitative Data | 1 |
Reports - Descriptive | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 2 |
Higher Education | 2 |
Middle Schools | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Law School Admission Test | 2 |
Florida Comprehensive… | 1 |
Graduate Record Examinations | 1 |
Medical College Admission Test | 1 |
Nelson Denny Reading Tests | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Koziol, Natalie A.; Goodrich, J. Marc; Yoon, HyeonJin – Educational and Psychological Measurement, 2022
Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A…
Descriptors: Regression (Statistics), Item Analysis, Validity, Testing Accommodations
Lee, Jihyun; Paek, Insu – Journal of Psychoeducational Assessment, 2014
Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling…
Descriptors: Likert Scales, Responses, Item Response Theory, Classification
Aydin, Selami – Education 3-13, 2012
Studies conducted so far have mainly focused on the relationships between perceptions of tests and test anxiety among adult foreign language learners, while there is a lack of research focusing on young learners on the above-mentioned issue. Thus, this study aims to examine the relationship between test anxiety among young learners who study…
Descriptors: Test Length, Content Validity, Validity, Measures (Individuals)
Lewandowski, Lawrence; Cohen, Justin; Lovett, Benjamin J. – Journal of Psychoeducational Assessment, 2013
Students with disabilities often receive test accommodations in schools and on high-stakes tests. Students with learning disabilities (LD) represent the largest disability group in schools, and extended time the most common test accommodation requested by such students. This pairing persists despite controversy over the validity of extended time…
Descriptors: Testing Accommodations, Learning Disabilities, Reading Comprehension, Undergraduate Students
Kim, Jihye – ProQuest LLC, 2010
In DIF studies, a Type I error refers to the mistake of identifying non-DIF items as DIF items, and a Type I error rate refers to the proportion of Type I errors in a simulation study. The possibility of making a Type I error in DIF studies is always present and high possibility of making such an error can weaken the validity of the assessment.…
Descriptors: Test Bias, Test Length, Simulation, Testing
Evans, Josiah Jeremiah – ProQuest LLC, 2010
In measurement research, data simulations are a commonly used analytical technique. While simulation designs have many benefits, it is unclear if these artificially generated datasets are able to accurately capture real examinee item response behaviors. This potential lack of comparability may have important implications for administration of…
Descriptors: Computer Assisted Testing, Adaptive Testing, Educational Testing, Admission (School)
Woods, Carol M. – Applied Psychological Measurement, 2008
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
Descriptors: Test Length, Computation, Item Response Theory, Maximum Likelihood Statistics

Neustel, Sandra – 2001
As a continuing part of its validity studies, the Association of American Medical Colleges commissioned a study of the speediness of the Medical College Admission Test (MCAT). If speed is a hidden part of the test, it is a threat to its construct validity. As a general rule, the criterion used to indicate lack of speediness is that 80% of the…
Descriptors: College Applicants, College Entrance Examinations, Higher Education, Medical Education

Sher, Kenneth J.; And Others – Psychological Assessment, 1995
Interrelated analyses were conducted with more than 4,000 college students to examine the reliability and validity of the Tridimensional Personality Questionnaire (TPQ) and to develop and validate a short version of the scale. Results provide moderate support for the reliability and validity of both the TPQ and the short form. (SLD)
Descriptors: College Students, Factor Analysis, Higher Education, Personality Assessment

Huynh, Huynh; Casteel, Jim – Journal of Experimental Education, 1987
In the context of pass/fail decisions, using the Bock multi-nominal latent trait model for moderate-length tests does not produce decisions that differ substantially from those based on the raw scores. The Bock decisions appear to relate less strongly to outside criteria than those based on the raw scores. (Author/JAZ)
Descriptors: Cutting Scores, Error Patterns, Grade 6, Intermediate Grades
Wingersky, Marilyn S.; Lord, Frederic M. – 1983
The sampling errors of maximum likelihood estimates of item-response theory parameters are studied in the case where both people and item parameters are estimated simultaneously. A check on the validity of the standard error formulas is carried out. The effect of varying sample size, test length, and the shape of the ability distribution is…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Banks, Latent Trait Theory
Steinheiser, Frederick H., Jr.; And Others – 1978
Alternative mathematical models for scoring and decision making with criterion referenced tests are described, especially as they concern appropriate test length and methods of establishing statistically valid cutting scores. Several of these approaches are reviewed and compared on formal-analytic and empirical grounds: (1) Block's approach to…
Descriptors: Comparative Analysis, Criterion Referenced Tests, Cutting Scores, Decision Making
Jones, Brett D.; Egley, Robert J. – ERS Spectrum, 2005
The purpose of this paper is to discuss Florida teachers' recommendations for improving the Florida Comprehensive Assessment Test (FCAT) and to compare their recommendations with those of Florida administrators. Although teachers' suggestions varied as to the types and extent of remedies needed to improve the FCAT, some common themes emerged. The…
Descriptors: Test Results, Core Curriculum, Student Evaluation, Accountability