ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	2

Descriptor

Evaluation Methods	5
Test Reliability	5
Test Validity	3
Item Response Theory	2
Student Evaluation	2
Test Bias	2
Bayesian Statistics	1
College Entrance Examinations	1
College Students	1
Compliance (Legal)	1
Constitutional Law	1
Court Litigation	1
Criteria	1
Cutting Scores	1
Decision Making	1
Employment	1
Equated Scores	1
Error of Measurement	1
Evaluation Problems	1
Experimenter Characteristics	1
Foreign Students	1
German	1
Group Testing	1
High Stakes Tests	1
Interrater Reliability	1
More ▼

Source

Applied Measurement in…

Author

Baghaei, Purya	1
Boughton, Keith A.	1
Eckes, Thomas	1
Fisher, Steve	1
Gierl, Mark J.	1
Gotzmann, Andrea	1
Johnson, Robert L.	1
Kuhs, Therese	1
Mehrens, William A.	1
Penny, Jim	1
Phillips, Gary W.	1
Popham, W. James	1
More ▼

Publication Type

Journal Articles	5
Reports - Research	3
Reports - Evaluative	2

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 5 results Save | Export

Using Testlet Response Theory to Examine Local Dependence in C-Tests

Peer reviewed

Direct link

Eckes, Thomas; Baghaei, Purya – Applied Measurement in Education, 2015

C-tests are gap-filling tests widely used to assess general language proficiency for purposes of placement, screening, or provision of feedback to language learners. C-tests consist of several short texts in which parts of words are missing. We addressed the issue of local dependence in C-tests using an explicit modeling approach based on testlet…

Descriptors: Language Proficiency, Language Tests, Item Response Theory, Test Reliability

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Performance of SIBTEST When the Percentage of DIF Items Is Large

Peer reviewed

Direct link

Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…

Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

Score Resolution: An Investigation of the Reliability and Validity of Resolved Scores

Peer reviewed

Direct link

Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003

When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…

Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability

How to Evaluate the Legal Defensibility of High-Stakes Tests.

Peer reviewed

Mehrens, William A.; Popham, W. James – Applied Measurement in Education, 1992

This paper discusses how to determine whether a test was developed in a legally defensible manner, reviewing general issues, specific cases bearing on different types of test use, some evaluative dimensions, and evidence of test quality. Tests constructed and used according to existing standards will generally stand legal scrutiny. (SLD)

Descriptors: College Entrance Examinations, Compliance (Legal), Constitutional Law, Court Litigation