ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	4

Descriptor

Item Response Theory	18
State Programs	18
Testing Programs	18
Test Construction	5
Achievement Tests	4
Cutting Scores	4
Error of Measurement	4
High Schools	4
Test Items	4
Comparative Analysis	3
Elementary Secondary Education	3
Grade 8	3
Item Bias	3
Mathematics Tests	3
Psychometrics	3
Reading Tests	3
Test Format	3
Test Validity	3
Correlation	2
Difficulty Level	2
Educational Assessment	2
Elementary School Students	2
Equated Scores	2
Estimation (Mathematics)	2
Generalizability Theory	2
More ▼

Source

Educational and Psychological…	3
Applied Measurement in…	1
Journal of Educational…	1
Journal of Research in…	1

Publication Type

Reports - Research	11
Speeches/Meeting Papers	9
Journal Articles	6
Reports - Evaluative	4
Reports - Descriptive	2
Information Analyses	1
Numerical/Quantitative Data	1

Education Level

Grade 6	1
Grade 8	1

Audience

Location

Canada	1
Hawaii	1
Illinois	1

Laws, Policies, & Programs

Assessments and Surveys

North Carolina End of Course…

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Science Assessments for All: Integrating Science Simulations into Balanced State Science Assessment Systems

Peer reviewed

Direct link

Quellmalz, Edys S.; Timms, Michael J.; Silberglitt, Matt D.; Buckley, Barbara C. – Journal of Research in Science Teaching, 2012

This article reports on the collaboration of six states to study how simulation-based science assessments can become transformative components of multi-level, balanced state science assessment systems. The project studied the psychometric quality, feasibility, and utility of simulation-based science assessments designed to serve formative purposes…

Descriptors: State Programs, Educational Assessment, Simulated Environment, Grade 6

A Comparison of Approaches for Improving the Reliability of Objective Level Scores

Peer reviewed

Direct link

Skorupski, William P.; Carvajal, Jorge – Educational and Psychological Measurement, 2010

This study is an evaluation of the psychometric issues associated with estimating objective level scores, often referred to as "subscores." The article begins by introducing the concepts of reliability and validity for subscores from statewide achievement tests. These issues are discussed with reference to popular scaling techniques, classical…

Descriptors: Testing Programs, Test Validity, Achievement Tests, Scores

A Generalizability Theory Approach to Standard Error Estimates for Bookmark Standard Settings

Peer reviewed

Direct link

Lee, Guemin; Lewis, Daniel M. – Educational and Psychological Measurement, 2008

The bookmark standard-setting procedure is an item response theory-based method that is widely implemented in state testing programs. This study estimates standard errors for cut scores resulting from bookmark standard settings under a generalizability theory model and investigates the effects of different universes of generalization and error…

Descriptors: Generalizability Theory, Testing Programs, Error of Measurement, Cutting Scores

Item Response Theory and Classical Test Theory: An Empirical Comparison of Their Item/Person Statistics.

Peer reviewed

Fan, Xitao – Educational and Psychological Measurement, 1998

This study empirically examined the behaviors of item and person statistics derived from item response theory and classical test theory, focusing on item and person statistics and using a large-scale statewide assessment. Findings show that the person and item statistics from the two measurement frameworks are quite comparable. (SLD)

Descriptors: Item Response Theory, State Programs, Statistical Analysis, Test Items

Differential Item Functioning: An Applied Comparison of the Item Characteristic Curve Method with the Logistic Discriminant Function Method.

Download full text

Holweger, Nancy; Weston, Timothy – 1998

This study compares logistic discriminant function analysis for differential item functioning (DIF) with a technique for the detection of DIF that is based on item response theory rather than the Mantel-Haenszel procedure. In this study, the areas between the two item characteristic curves, also called the item characteristic curve method is…

Descriptors: Item Bias, Item Response Theory, Performance Based Assessment, State Programs

Using Traditional Psychometric Methodologies and the Rasch Model in Designing a Test.

Download full text

Crislip, Marian A.; Chin-Chance, Selvin – 2001

This paper discusses the use of two theories of item analysis and test construction, their strengths and weaknesses, and applications to the design of the Hawaii State Test of Essential Competencies (HSTEC). Traditional analyses of the data collected from the HSTEC field test were viewed from the perspectives of item difficulty levels and item…

Descriptors: Difficulty Level, Item Response Theory, Psychometrics, Reliability

Assessing the Effect of Model-Data Misfit on the Invariance Property of IRT Parameter Estimates.

Download full text

Fan, Xitao; Ping, Yin – 1999

This study empirically investigated the potential negative effect of item response theory (IRT) model-data misfit on the degree of invariance of: (1) IRT item parameter estimates (item difficulty and discrimination); and (2) IRT person ability parameter estimates. A large-scale statewide assessment program test database was used, for which the…

Descriptors: Estimation (Mathematics), Goodness of Fit, High Schools, Item Response Theory

Applying the Rasch Model To Explore Rater Influences on the Assessed Quality of Students' Writing Ability.

Download full text

Gyagenda, Ismail S.; Engelhard, George, Jr. – 1998

The purpose of this study was to describe the Rasch model for measurement and apply the model to examine the relationship between raters, domains of written compositions, and student writing ability. Twenty raters were randomly selected from a group of 87 operational raters contracted to rate essays as part of the 1993 field test of the Georgia…

Descriptors: Difficulty Level, Essay Tests, Evaluators, High School Students

A Comparison of Developmental Scales Based on Thurstone Methods and Item Response Theory.

Peer reviewed

Williams, Valerie S. L.; Pommerich, Mary; Thissen, David – Journal of Educational Measurement, 1998

Created a developmental scale for the North Carolina End-of-Grade Mathematics Tests using a subset of identical test forms administered to adjacent grade levels with Thurstone scaling and Item Response Theory methods. Discusses differences in patterns produced. (Author/SLD)

Descriptors: Achievement Tests, Child Development, Comparative Analysis, Elementary Secondary Education

The Use of Rasch Model Fit Statistics in Selecting Items for the Maryland Functional Testing Program.

Download full text

Baghi, Heibatollah – 1990

The Maryland Functional Testing Program (MFTP) uses the Rasch model as the statistical framework for the analysis of test items and scores. This paper is designed to assist the reader in developing an understanding of the fit statistics in the Rasch model. Background materials on application of the Rasch model in statistical analysis of the MFTP…

Descriptors: Computer Assisted Testing, Computer Software, Equated Scores, Error of Measurement

Curriculum and Translation Differential Item Functioning: A Comparison of Two DIF Detection Techniques.

Download full text

Emenogu, Barnabas; Childs, Ruth A. – 2003

This study investigated the possible impacts of language and curriculum differences on the performance of test items by subpopulations of students. Focusing on Measurement and Geometry items completed by students in French- and English-language schools in Ontario made it possible to explore the differences and to compare the item response theory…

Descriptors: Curriculum, English, Foreign Countries, French

A Generalizability Theory Approach toward Estimating Standard Errors of Cutscores Set Using the Bookmark Standard Setting Procedure.

Download full text

Lee, Guemin; Lewis, Daniel M. – 2001

The Bookmark Standard Setting Procedure (Lewis, Mitzel, and Green, 1996) is an item-response-theory-based standard setting method that has been widely implemented by state testing programs. The primary purposes of this study were to: (1) estimate standard errors for cutscores that result from Bookmark standard settings under a generalizability…

Descriptors: Cutting Scores, Elementary School Students, Elementary Secondary Education, Error of Measurement

Empirical Appraisal of Items for Innovative Teacher Tests.

Sinclair, Norma; Pecheone, Raymond L. – 1991

The impact of test item multidimensionality was examined as it affected fitting items on a teacher licensure test to an item response theory (IRT) model. Test item data from the 1990 study of G. W. Guiton and G. Delandshere are used. The Connecticut Elementary Certification Test (CONNECT) is a licensure examination designed to assess the subject…

Descriptors: Beginning Teachers, Elementary Education, Elementary School Teachers, Higher Education

A Comparison of IRT, Delta Plot, and Mantel-Haenszel Techniques for Detecting Differential Item Functioning Across Subpopulations in the Maryland Test of Citizenship Skills.

PDF pending restoration

Baghi, Heibatollah; Ferrara, Steven F. – 1989

Use of item response theory (IRT), the delta plot method, and Mantel-Haenszel techniques to assess differential item functioning (DIF) across racial and gender groups associated with the Maryland Test of Citizenship Skills (MTCS) is described. The objective of this research was to determine the: effect of sample size on results from these three…

Descriptors: Black Students, Citizenship Education, Comparative Analysis, Comparative Testing

Previous Page | Next Page »

Pages: 1 | 2

Baghi, Heibatollah	2
Fan, Xitao	2
Lee, Guemin	2
Lewis, Daniel M.	2
Ackerman, Terry	1
Bolt, Daniel	1
Buckley, Barbara C.	1
Carvajal, Jorge	1
Childs, Ruth A.	1
Chin-Chance, Selvin	1
Crislip, Marian A.	1
Emenogu, Barnabas	1
Engelhard, George, Jr.	1
Ferrara, Steven F.	1
Gyagenda, Ismail S.	1
Holweger, Nancy	1
Kahl, Stuart R.	1
Klein, Thomas W.	1
Pecheone, Raymond L.	1
Phillips, Gary W.	1
Ping, Yin	1
Pommerich, Mary	1
Quellmalz, Edys S.	1
Silberglitt, Matt D.	1
More ▼