Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 8 |
Descriptor
Comparative Analysis | 14 |
Error of Measurement | 14 |
Test Construction | 14 |
Test Items | 8 |
Item Analysis | 6 |
Achievement Tests | 5 |
Correlation | 4 |
Scores | 4 |
Statistical Analysis | 4 |
Test Reliability | 4 |
Criterion Referenced Tests | 3 |
More ▼ |
Source
ETS Research Report Series | 2 |
Assessment & Evaluation in… | 1 |
Assessment for Effective… | 1 |
Education and Information… | 1 |
International Journal of… | 1 |
Partnership for Assessment of… | 1 |
ProQuest LLC | 1 |
Author
Haladyna, Tom | 3 |
Roid, Gale | 2 |
Asil, Mustafa | 1 |
Benson, Jeri | 1 |
Biancarosa, Gina | 1 |
Briggs, Derek C. | 1 |
Brown, Gavin T. L. | 1 |
Ferrao, Maria | 1 |
Fien, Hank | 1 |
Gelbal, Selahattin | 1 |
Kim, Sooyeon | 1 |
More ▼ |
Publication Type
Reports - Research | 9 |
Journal Articles | 6 |
Speeches/Meeting Papers | 3 |
Reports - Evaluative | 2 |
Dissertations/Theses -… | 1 |
Reports - Descriptive | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 2 |
Grade 2 | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Location
Australia | 1 |
Colorado (Boulder) | 1 |
Oregon | 1 |
Portugal | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Race to the Top | 1 |
Assessments and Surveys
Dynamic Indicators of Basic… | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation
Asil, Mustafa; Brown, Gavin T. L. – International Journal of Testing, 2016
The use of the Programme for International Student Assessment (PISA) across nations, cultures, and languages has been criticized. The key criticisms point to the linguistic and cultural biases potentially underlying the design of reading comprehension tests, raising doubts about the legitimacy of comparisons across economies. Our research focused…
Descriptors: Comparative Analysis, Reading Achievement, Achievement Tests, Secondary School Students
Stoolmiller, Michael; Biancarosa, Gina; Fien, Hank – Assessment for Effective Intervention, 2013
Lack of psychometric equivalence of oral reading fluency (ORF) passages used within a grade for screening and progress monitoring has recently become an issue with calls for the use of equating methods to ensure equivalence. To investigate the nature of the nonequivalence and to guide the choice of equating method to correct for nonequivalence,…
Descriptors: School Personnel, Reading Fluency, Emergent Literacy, Psychometrics
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009
A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…
Descriptors: Sampling, Sample Size, Accuracy, Test Items
Wang, Huan – ProQuest LLC, 2010
Multiple uses of the same assessment may present challenges for both the design and use of an assessment. Little advice, however, has been given to assessment developers as to how to understand the phenomena of multiple assessment use and meet the challenges these present. Particularly problematic is the case in which an assessment is used for…
Descriptors: Test Use, Testing Programs, Program Effectiveness, Test Construction
Briggs, Derek C. – Partnership for Assessment of Readiness for College and Careers, 2011
There is often confusion about distinctions between growth models and value-added models. The first half of this paper attempts to dispel some of these confusions by clarifying terminology and illustrating by example how the results from a large-scale assessment can and will be used to make inferences about student growth and the value-added…
Descriptors: Value Added Models, Language Usage, Measurement, Inferences
Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010
The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…
Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment
Sullins, Walter L. – 1971
Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
Descriptors: Comparative Analysis, Correlation, Error of Measurement, Item Analysis
Lance, Charles E.; Moomaw, Michael E. – 1983
Direct assessments of the accuracy with which raters can use a rating instrument are presented. This study demonstrated how surplus behavioral incidents scaled during the development of Behaviorally Anchored Rating Scales (BARS) can be used effectively in the evaluation of the newly developed scales. Construction of scenarios of hypothetical…
Descriptors: Behavior Rating Scales, Comparative Analysis, Error of Measurement, Evaluation Criteria
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.
Benson, Jeri; Wilson, Michael – 1979
Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Descriptors: Comparative Analysis, Difficulty Level, Efficiency, Error of Measurement
Roid, Gale; Haladyna, Tom – 1978
The technology of transforming sentences from prose instruction into test questions was examined by comparing two methods of selecting sentences (keyword vs. rare singleton), two types of question words (nouns vs. adjectives), and two foil construction methods (writer's choice vs. algorithmic). Four item writers created items using each…
Descriptors: Algorithms, Cloze Procedure, Comparative Analysis, Criterion Referenced Tests
Haladyna, Tom – 1976
The existence of criterion-referenced (CR) measurement is questioned in this paper. Despite beliefs that differences exist between two alternative forms of measurement, CR and Norm Referenced (NR), an analysis of philosophical and psychological descriptions of measurement, as well as a growing number of empirical studies, reveal that the common…
Descriptors: Academic Standards, Achievement Tests, Career Development, Comparative Analysis
Haladyna, Tom; Roid, Gale – 1976
Three approaches to the construction of achievement tests are compared: construct, operational, and empirical. The construct approach is based upon classical test theory and measures an abstract representation of the instructional objectives. The operational approach specifies instructional intent through instructional objectives, facet design,…
Descriptors: Academic Achievement, Achievement Tests, Career Development, Comparative Analysis