ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	10

Descriptor

Error of Measurement	18
Scaling	18
Test Items	18
Item Response Theory	10
Simulation	8
Equated Scores	5
Scores	5
Statistical Bias	4
Computer Assisted Testing	3
Item Analysis	3
Latent Trait Theory	3
Maximum Likelihood Statistics	3
Pretests Posttests	3
Scoring	3
Statistical Analysis	3
Test Reliability	3
Adaptive Testing	2
Comparative Analysis	2
Effect Size	2
Estimation (Mathematics)	2
Evaluation Criteria	2
Evaluation Methods	2
Foreign Countries	2
Generalizability Theory	2
Goodness of Fit	2
More ▼

Source

ETS Research Report Series	2
Applied Measurement in…	1
Applied Psychological…	1
EURASIA Journal of…	1
Educational and Psychological…	1
International Journal of…	1
Journal of Educational…	1
Language Assessment Quarterly	1
Measurement and Evaluation in…	1
ProQuest LLC	1
Sociological Methods &…	1
More ▼

Publication Type

Journal Articles	11
Reports - Research	11
Reports - Evaluative	4
Reports - Descriptive	2
Speeches/Meeting Papers	2
Dissertations/Theses -…	1

Education Level

Elementary Secondary Education	1
Grade 3	1
Grade 5	1
Grade 7	1
Grade 9	1
High Schools	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
Secondary Education	1
More ▼

Audience

Researchers

Location

Indonesia	1
Japan	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Graduate Record Examinations	1
Sentence Completion Test	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Modeling of Item Response Functions under the D-Scoring Method

Peer reviewed

Direct link

Dimitrov, Dimiter M. – Educational and Psychological Measurement, 2020

This study presents new models for item response functions (IRFs) in the framework of the D-scoring method (DSM) that is gaining attention in the field of educational and psychological measurement and largescale assessments. In a previous work on DSM, the IRFs of binary items were estimated using a logistic regression model (LRM). However, the LRM…

Descriptors: Item Response Theory, Scoring, True Scores, Scaling

Polytomous Rasch Models in Counseling Assessment

Peer reviewed

Direct link

Willse, John T. – Measurement and Evaluation in Counseling and Development, 2017

This article provides a brief introduction to the Rasch model. Motivation for using Rasch analyses is provided. Important Rasch model concepts and key aspects of result interpretation are introduced, with major points reinforced using a simulation demonstration. Concrete guidelines are provided regarding sample size and the evaluation of items.

Descriptors: Item Response Theory, Test Results, Test Interpretation, Simulation

How Does Polytomous Item Bias Affect Total-Group Survey Score Comparisons?

Peer reviewed

Direct link

Hidalgo, Ma Dolores; Benítez, Isabel; Padilla, Jose-Luis; Gómez-Benito, Juana – Sociological Methods & Research, 2017

The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A…

Descriptors: Test Items, Test Bias, Item Response Theory, Surveys

A Criterion to Evaluate the Individual Raw-to-Scale Equating Conversions. Research Report. ETS RR-13-05

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Puhan, Gautam; Walker, Michael – ETS Research Report Series, 2013

In this study we investigated when an equating conversion line is problematic in terms of gaps and clumps. We suggest using the conditional standard error of measurement (CSEM) to measure the scale scores that are inappropriate in the overall raw-to-scale transformation.

Descriptors: Equated Scores, Test Items, Evaluation Criteria, Error of Measurement

Guessing and the Rasch Model

Peer reviewed

Direct link

Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016

Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests

Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales

Direct link

Topczewski, Anna Marie – ProQuest LLC, 2013

Developmental score scales represent the performance of students along a continuum, where as students learn more they move higher along that continuum. Unidimensional item response theory (UIRT) vertical scaling has become a commonly used method to create developmental score scales. Research has shown that UIRT vertical scaling methods can be…

Descriptors: Item Response Theory, Scaling, Scores, Student Development

Multidimensional Computerized Adaptive Testing for Indonesia Junior High School Biology

Peer reviewed

Direct link

Kuo, Bor-Chen; Daud, Muslem; Yang, Chih-Wei – EURASIA Journal of Mathematics, Science & Technology Education, 2015

This paper describes a curriculum-based multidimensional computerized adaptive test that was developed for Indonesia junior high school Biology. In adherence to the Indonesian curriculum of different Biology dimensions, 300 items was constructed, and then tested to 2238 students. A multidimensional random coefficients multinomial logit model was…

Descriptors: Secondary School Science, Science Education, Science Tests, Computer Assisted Testing

Applying Rasch Model and Generalizability Theory to Study Modified-Angoff Cut Scores

Peer reviewed

Direct link

Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012

The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…

Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling

Reliability and the Nonequivalent Groups with Anchor Test Design. Research Report. ETS RR-07-16

Peer reviewed
PDF on ERIC

Download full text

Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007

This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…

Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis

Some Consequences of the Uncertainty in IRT Linking Procedures.

Download full text

Sheehan, Kathleen M.; Mislevy, Robert J. – 1988

In many practical applications of item response theory, the parameters of overlapping subsets of test items are estimated from different samples of examinees. A linking procedure is then employed to place the resulting item parameter estimates onto a common scale. It is standard practice to ignore the uncertainty associated with the linking step…

Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Measurement Techniques

Evaluation of the Magnitude of Differential Item Functioning in Polytomous Items. Program Statistics Research Technical Report No. 94-2.

Download full text

Zwick, Rebecca; Thayer, Dorothy T. – 1994

Several recent studies have investigated the application of statistical inference procedures to the analysis of differential item functioning (DIF) in test items that are scored on an ordinal scale. Mantel's extension of the Mantel-Haenszel test is a possible hypothesis-testing method for this purpose. The development of descriptive statistics for…

Descriptors: Error of Measurement, Evaluation Methods, Hypothesis Testing, Item Bias

Data Sparseness and On-Line Pretest Item Calibration-Scaling Methods in CAT.

Peer reviewed

Ban, Jae-Chun; Hanson, Bradley A.; Yi, Qing; Harris, Deborah J. – Journal of Educational Measurement, 2002

Compared three online pretest calibration scaling methods through simulation: (1) marginal maximum likelihood with one expectation maximization (EM) cycle (OEM) method; (2) marginal maximum likelihood with multiple EM cycles (MEM); and (3) M. Stocking's method B. MEM produced the smallest average total error in parameter estimation; OEM yielded…

Descriptors: Computer Assisted Testing, Error of Measurement, Maximum Likelihood Statistics, Online Systems

Scale Shrinkage in Vertical Equating.

Peer reviewed

Camilli, Gregory; And Others – Applied Psychological Measurement, 1993

Three potential causes of scale shrinkage (measurement error, restriction of range, and multidimensionality) in item response theory vertical equating are discussed, and a more comprehensive model-based approach to establishing vertical scales is described. Test data from the National Assessment of Educational Progress are used to illustrate the…

Descriptors: Equated Scores, Error of Measurement, Item Response Theory, Maximum Likelihood Statistics

A Process for Testing a Mathematical Model for the Solution of a Practical Problem: Application to Test Equating. LES Paper on Learning and Teaching. Paper #79.

Douglass, James B. – 1979

A general process for testing the feasibility of applying alternative mathematical or statistical models to the solution of a practical problem is presented and flowcharted. The system is used to develop a plan to compare models for test equating. The five alternative models to be considered for equating are: (1) anchor test equating using…

Descriptors: Equated Scores, Error of Measurement, Latent Trait Theory, Mathematical Models

Previous Page | Next Page »

Pages: 1 | 2

Ban, Jae-Chun	2
Hanson, Bradley A.	2
Harris, Deborah J.	2
Yi, Qing	2
Arce, Alvaro J.	1
Benítez, Isabel	1
Camilli, Gregory	1
Cook, Linda L.	1
Daud, Muslem	1
Dimitrov, Dimiter M.	1
Douglass, James B.	1
Guo, Hongwen	1
Gómez-Benito, Juana	1
Hidalgo, Ma Dolores	1
Holster, Trevor A.	1
Kim, Sooyeon	1
Kim, Stella Yun	1
Kuo, Bor-Chen	1
Lake, J.	1
Lee, Won-Chan	1
Lord, Frederic M.	1
Mislevy, Robert J.	1
Moses, Tim	1
Padilla, Jose-Luis	1
More ▼