Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 6 |
Descriptor
Item Analysis | 8 |
Test Validity | 8 |
Test Reliability | 4 |
Foreign Countries | 3 |
Item Response Theory | 3 |
Mathematics Tests | 3 |
Test Items | 3 |
Achievement Tests | 2 |
Correlation | 2 |
Equated Scores | 2 |
Error of Measurement | 2 |
More ▼ |
Source
Applied Measurement in… | 8 |
Author
Bart, William M. | 1 |
Crocker, Linda M. | 1 |
Eklöf, Hanna | 1 |
Grønmo, Liv Sissel | 1 |
Lee, Yoonsun | 1 |
Lin, Jie | 1 |
Musch, Jochen | 1 |
Papenberg, Martin | 1 |
Pavešic, Barbara Japelj | 1 |
Phillips, Gary W. | 1 |
Rinaldi, Christia M. | 1 |
More ▼ |
Publication Type
Journal Articles | 8 |
Reports - Research | 5 |
Reports - Evaluative | 3 |
Education Level
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Grade 12 | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
Grade 9 | 1 |
High Schools | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
Program for International… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability
Eklöf, Hanna; Pavešic, Barbara Japelj; Grønmo, Liv Sissel – Applied Measurement in Education, 2014
The purpose of the study was to measure students' reported test-taking effort and the relationship between reported effort and performance on the Trends in International Mathematics and Science Study (TIMSS) Advanced mathematics test. This was done in three countries participating in TIMSS Advanced 2008 (Sweden, Norway, and Slovenia), and the…
Descriptors: Mathematics Tests, Cross Cultural Studies, Foreign Countries, Correlation
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis
Rogers, W. Todd; Lin, Jie; Rinaldi, Christia M. – Applied Measurement in Education, 2011
The evidence gathered in the present study supports the use of the simultaneous development of test items for different languages. The simultaneous approach used in the present study involved writing an item in one language (e.g., French) and, before moving to the development of a second item, translating the item into the second language (e.g.,…
Descriptors: Test Items, Item Analysis, Achievement Tests, French

Bart, William M.; Williams-Morris, Ruth – Applied Measurement in Education, 1990
Refined item digraph analysis (RIDA) is a way of studying diagnostic and prescriptive testing. It permits assessment of a test item's diagnostic value by examining the extent to which the item has properties of ideal items. RIDA is illustrated with the Orange Juice Test, which assesses the proportionality concept. (TJH)
Descriptors: Diagnostic Tests, Evaluation Methods, Item Analysis, Mathematical Models

Crocker, Linda M.; And Others – Applied Measurement in Education, 1989
Techniques for quantifying the degree of fit between test items and curricula are classified according to the purposes of assessing: overall fit, fit of individual items to content domain, and the impact of test specifications on performance. Procedures for calculating each index and their properties are included. (SLD)
Descriptors: Achievement Tests, Content Validity, Curriculum, Elementary Secondary Education
Wise, Steven L. – Applied Measurement in Education, 2006
In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…
Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory