ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	10

Descriptor

Error of Measurement	15
Generalizability Theory	15
Test Items	15
Cutting Scores	4
Item Response Theory	4
Reliability	4
Difficulty Level	3
Scoring	3
Standard Setting (Scoring)	3
Statistical Analysis	3
Academic Achievement	2
Achievement Tests	2
Certification	2
Comparative Analysis	2
English	2
English (Second Language)	2
Foreign Countries	2
Grade 4	2
Grade 5	2
Interrater Reliability	2
Licensing Examinations…	2
Mathematics Tests	2
Probability	2
Psychometrics	2
Scaling	2
More ▼

Source

Applied Measurement in…	2
ETS Research Report Series	2
Educational Measurement:…	2
Applied Psychological…	1
Educational Researcher	1
Educational and Psychological…	1
International Journal of…	1
Journal of Educational…	1
Online Submission	1

Publication Type

Journal Articles	11
Reports - Evaluative	8
Reports - Research	6
Speeches/Meeting Papers	3
Information Analyses	1
Reports - Descriptive	1

Education Level

Grade 5	3
Grade 3	2
Grade 4	2
Elementary Secondary Education	1
Grade 7	1

Audience

Researchers

Location

Haiti

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
National Assessment of…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model

Download full text

Custer, Michael; Kim, Jongpil – Online Submission, 2023

This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…

Descriptors: Sample Size, Item Response Theory, Test Items, Computation

An Information-Correction Method for Testlet-Based Test Analysis: From the Perspectives of Item Response Theory and Generalizability Theory. Research Report. ETS RR-17-27

Peer reviewed
PDF on ERIC

Download full text

Li, Feifei – ETS Research Report Series, 2017

An information-correction method for testlet-based tests is introduced. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By…

Descriptors: Item Response Theory, Generalizability Theory, Tests, Error of Measurement

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Applying Rasch Model and Generalizability Theory to Study Modified-Angoff Cut Scores

Peer reviewed

Direct link

Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012

The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…

Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Estimating Standard Errors of Cut Scores for Item Rating and Mapmark Procedures: A Generalizability Theory Approach

Peer reviewed

Direct link

Yin, Ping; Sconing, James – Educational and Psychological Measurement, 2008

Standard-setting methods are widely used to determine cut scores on a test that examinees must meet for a certain performance standard. Because standard setting is a measurement procedure, it is important to evaluate variability of cut scores resulting from the standard-setting process. Generalizability theory is used in this study to estimate…

Descriptors: Generalizability Theory, Standard Setting, Cutting Scores, Test Items

Same-Form Retest Effects on Credentialing Examinations

Peer reviewed

Direct link

Raymond, Mark R.; Neustel, Sandra; Anderson, Dan – Educational Measurement: Issues and Practice, 2009

Examinees who take high-stakes assessments are usually given an opportunity to repeat the test if they are unsuccessful on their initial attempt. To prevent examinees from obtaining unfair score increases by memorizing the content of specific test items, testing agencies usually assign a different test form to repeat examinees. The use of multiple…

Descriptors: Test Results, Test Items, Testing, Aptitude Tests

Who Is Given Tests in What Language by Whom, When, and Where? The Need for Probabilistic Views of Language in the Testing of English Language Learners

Peer reviewed

Direct link

Solano-Flores, Guillermo – Educational Researcher, 2008

The testing of English language learners (ELLs) is, to a large extent, a random process because of poor implementation and factors that are uncertain or beyond control. Yet current testing practices and policies appear to be based on deterministic views of language and linguistic groups and erroneous assumptions about the capacity of assessment…

Descriptors: Generalizability Theory, Testing, Second Language Learning, Error of Measurement

Reliability and the Nonequivalent Groups with Anchor Test Design. Research Report. ETS RR-07-16

Peer reviewed
PDF on ERIC

Download full text

Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007

This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…

Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis

The Information in Multiple Ratings

Peer reviewed

Direct link

Bock, R. Darrell; Brennan, Robert L.; Muraki, Eiji – Applied Psychological Measurement, 2002

In assessment programs where scores are reported for individual examinees, it is desirable to have responses to performance exercises graded by more than one rater. If more than one item on each test form is so graded, it is also desirable that different raters grade the responses of any one examinee. This gives rise to sampling designs in which…

Descriptors: Generalizability Theory, Test Items, Item Response Theory, Error of Measurement

A Comparison of Methods of Estimating Conditional Standard Errors of Measurement for Testlet-based Test Scores.

Peer reviewed

Lee, Guemin – Journal of Educational Measurement, 2000

Studied the appropriateness and implications of incorporating a testlet definition into the estimation of procedures of the conditional standard error of measurement (SEM) for tests composed of testlets. Simulation results for several methods show that an item-based method using a generalizability theory model provided good estimates of the…

Descriptors: Comparative Analysis, Error of Measurement, Estimation (Mathematics), Generalizability Theory

A Generalizability Study of the Angoff Method Applied to Setting Cutoff Scores of Professional Certification Tests.

Cope, Ronald T. – 1987

This study used generalizability theory and other statistical concepts to assess the application of the Angoff method to setting cutoff scores on two professional certification tests. A panel of ten judges gave pre- and post-feedback Angoff probability ratings of items of two forms of a professional certification test, and another panel of nine…

Descriptors: Certification, Correlation, Cutting Scores, Error of Measurement

The Use of Generalizability (G) Theory in the Testing of Linguistic Minorities

Peer reviewed

Direct link

Solano-Flores, Guillermo; Li, Min – Educational Measurement: Issues and Practice, 2006

We contend that generalizability (G) theory allows the design of psychometric approaches to testing English-language learners (ELLs) that are consistent with current thinking in linguistics. We used G theory to estimate the amount of measurement error due to code (language or dialect). Fourth- and fifth-grade ELLs, native speakers of…

Descriptors: Foreign Countries, Grade 4, Grade 5, English (Second Language)

A Multivariate Generalizability Analysis of the 1989 and 1990 AAP Mathematics Test Forms with Respect to the Table of Specifications.

Download full text

Colton, Dean A. – 1993

Tables of specifications are used to guide test developers in sampling items and maintaining consistency from form to form. This paper is a generalizability study of the American College Testing Program (ACT) Achievement Program Mathematics Test (AAP), with the content areas of the table of specifications representing multiple dependent variables.…

Descriptors: Achievement Tests, Difficulty Level, Error of Measurement, Generalizability Theory

The Generalizability of Scoring TIMSS Open-Ended Items.

Download full text

Smith, Teresa A. – 1997

The Third International Mathematics and Science Study (TIMSS) measured mathematics and science achievement of middle school students in more than 40 countries. About one quarter of the tests' nearly 300 items were free response items requiring students to generate their own answers. Scoring these responses used a two-digit diagnostic code rubric…

Descriptors: Comparative Education, English, Error of Measurement, Foreign Countries

Solano-Flores, Guillermo	3
Anderson, Dan	1
Arce, Alvaro J.	1
Bock, R. Darrell	1
Brennan, Robert L.	1
Colton, Dean A.	1
Cope, Ronald T.	1
Custer, Michael	1
Kachchaf, Rachel	1
Kannan, Priya	1
Katz, Irvin R.	1
Kim, Jongpil	1
Kim, Sooyeon	1
Lee, Guemin	1
Li, Feifei	1
Li, Min	1
Moses, Tim	1
Muraki, Eiji	1
Neustel, Sandra	1
Raymond, Mark R.	1
Sconing, James	1
Sgammato, Adrienne	1
Smith, Teresa A.	1
Tannenbaum, Richard J.	1
Wang, Ze	1
More ▼