ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	14

Descriptor

Evaluation Methods	26
Test Items	26
Scaling	19
Item Response Theory	11
Test Construction	9
Multidimensional Scaling	7
Psychometrics	7
Item Analysis	6
Difficulty Level	5
Measurement Techniques	5
Statistical Analysis	5
Equated Scores	4
Foreign Countries	4
Measures (Individuals)	4
Scores	4
Simulation	4
Student Evaluation	4
Achievement Tests	3
Comparative Analysis	3
Correlation	3
Data Analysis	3
Educational Assessment	3
Hypothesis Testing	3
Measurement	3
Predictor Variables	3
More ▼

Source

Educational and Psychological…	3
Applied Psychological…	2
Educational Assessment	2
College Board	1
Educational Measurement:…	1
Gifted Child Quarterly	1
Grantee Submission	1
International Association for…	1
Journal of Early Intervention	1
Journal of Educational…	1
Journal of Educational and…	1
Measurement in Physical…	1
Ministerial Council on…	1
OECD Publishing (NJ1)	1
More ▼

Publication Type

Reports - Research	14
Journal Articles	13
Speeches/Meeting Papers	8
Reports - Evaluative	6
Reports - Descriptive	5
Numerical/Quantitative Data	2
Guides - General	1
Tests/Questionnaires	1

Education Level

Elementary Secondary Education	5
Elementary Education	3
Secondary Education	3
Grade 8	2
Grade 4	1
Grade 6	1
Grade 9	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Primary Education	1
More ▼

Audience

Researchers	3
Practitioners	1
Teachers	1

Location

Asia	2
Australia	1
Florida	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	2
Florida Comprehensive…	1
Piers Harris Childrens Self…	1
Tennessee Self Concept Scale	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Psychometric Consequences of Subpopulation Item Parameter Drift

Peer reviewed

Direct link

Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2017

This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…

Descriptors: Psychometrics, Test Items, Item Response Theory, Hypothesis Testing

A Multidimensional Assessment of Teachers' Knowledge of Algebra for Teaching: Developing an Instrument and Supporting Valid Inferences

Peer reviewed

Direct link

Reckase, Mark D.; McCrory, Raven; Floden, Robert E.; Ferrini-Mundy, Joan; Senk, Sharon L. – Educational Assessment, 2015

Numerous researchers have suggested that there are multiple mathematical knowledge and skill areas needed by teachers in order for them to be effective teachers of mathematics: knowledge of the mathematics that are the goals of instruction, advanced mathematics beyond the instructional material, and mathematical knowledge that is specific to what…

Descriptors: Algebra, Knowledge Base for Teaching, Multidimensional Scaling, Psychometrics

Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Peer reviewed

Direct link

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei – Applied Psychological Measurement, 2013

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

Descriptors: Regression (Statistics), Item Response Theory, Test Items, Equated Scores

Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities

Peer reviewed
PDF on ERIC

Download full text

Sabatini, John; Petscher, Yaacov; O'Reilly, Tenaha; Truckenmiller, Adrea – Grantee Submission, 2015

For decades, standardized reading comprehension tests have consisted of a series of passages and associated multiple-choice questions. Although widely used in and out of the classroom, there continues to be considerable disagreement regarding how or whether such tests have net value in the service of advancing educational progress in reading. This…

Descriptors: Middle School Students, High School Students, Reading Comprehension, Reading Tests

An Investigation of Sample Size Splitting on ATFIND and DIMTEST

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013

Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

Descriptors: Sample Size, Test Length, Correlation, Test Format

An Investigation of Scale Drift for Arithmetic Assessment of ACCUPLACER®. Research Report No. 2010-2

Download full text

Deng, Hui; Melican, Gerald – College Board, 2010

The current study was designed to extend the current literature to study scale drift in CAT as part of improving quality control and calibration process for ACCUPLACER, a battery of large-scale adaptive placement tests. The study aims to evaluate item parameter drift using empirical data that span four years from the ACCUPLACER Arithmetic…

Descriptors: Student Placement, Adaptive Testing, Computer Assisted Testing, Mathematics Tests

Developing the Impossible Figures Task to Assess Visual-Spatial Talents among Chinese Students: A Rasch Measurement Model Analysis

Peer reviewed

Direct link

Chan, David W. – Gifted Child Quarterly, 2010

Data of item responses to the Impossible Figures Task (IFT) from 492 Chinese primary, secondary, and university students were analyzed using the dichotomous Rasch measurement model. Item difficulty estimates and person ability estimates located on the same logit scale revealed that the pooled sample of Chinese students, who were relatively highly…

Descriptors: Test Items, Adaptive Testing, Scaling, Talent Identification

PE Metrics: Background, Testing Theory, and Methods

Peer reviewed

Direct link

Zhu, Weimo; Rink, Judy; Placek, Judith H.; Graber, Kim C.; Fox, Connie; Fisette, Jennifer L.; Dyson, Ben; Park, Youngsik; Avery, Marybell; Franck, Marian; Raynes, De – Measurement in Physical Education and Exercise Science, 2011

New testing theories, concepts, and psychometric methods (e.g., item response theory, test equating, and item bank) developed during the past several decades have many advantages over previous theories and methods. In spite of their introduction to the field, they have not been fully accepted by physical educators. Further, the manner in which…

Descriptors: Physical Education, Quality Control, Psychometrics, Item Response Theory

Bayesian Multidimensional IRT Models with a Hierarchical Structure

Peer reviewed

Direct link

Sheng, Yanyan; Wikle, Christopher K. – Educational and Psychological Measurement, 2008

As item response models gain increased popularity in large-scale educational and measurement testing situations, many studies have been conducted on the development and applications of unidimensional and multidimensional models. Recently, attention has been paid to IRT-based models with an overall ability dimension underlying several ability…

Descriptors: Test Items, Individual Testing, Item Response Theory, Evaluation Methods

Comparing the Similarities and Differences of PISA 2003 and TIMSS. OECD Education Working Papers, No. 32

Direct link

Wu, Margaret – OECD Publishing (NJ1), 2010

This paper makes an in-depth comparison of the PISA (OECD) and TIMSS (IEA) mathematics assessments conducted in 2003. First, a comparison of survey methodologies is presented, followed by an examination of the mathematics frameworks in the two studies. The methodologies and the frameworks in the two studies form the basis for providing…

Descriptors: Mathematics Achievement, Foreign Countries, Gender Differences, Comparative Analysis

A Method for Comparing Test Difficulties.

Download full text

Frisbie, David A. – 1981

The relative difficulty ratio (RDR) is used as a method of representing test difficulty. The RDR is the ratio of a test mean to the ideal mean, the point midway between the perfect score and the mean chance score for the test. The RDR tranformation is a linear scale conversion method but not a linear equating method in the classical sense. The…

Descriptors: Comparative Testing, Difficulty Level, Evaluation Methods, Raw Scores

Gathering and Analyzing Content Validity Data.

Peer reviewed

Sireci, Stephen G. – Educational Assessment, 1998

Describes content-validity theory and illustrates new and traditional approaches for conducting content-validity studies. Newer approaches are based on multidimensional scaling analysis of item-similarity ratings, while traditional approaches are based on ratings of item-objective congruence and relevance. (Author/SLD)

Descriptors: Content Validity, Data Analysis, Evaluation Methods, Multidimensional Scaling

Using Dimensionality-Based DIF Analyses to Identify and Interpret Constructs That Elicit Group Differences

Peer reviewed

Direct link

Gierl, Mark J. – Educational Measurement: Issues and Practice, 2005

In this paper I describe and illustrate the Roussos-Stout (1996) multidimensionality-based DIF analysis paradigm, with emphasis on its implication for the selection of a matching and studied subtest for DIF analyses. Standard DIF practice encourages an exploratory search for matching subtest items based on purely statistical criteria, such as a…

Descriptors: Models, Test Items, Test Bias, Statistical Analysis

Evaluation of the Magnitude of Differential Item Functioning in Polytomous Items. Program Statistics Research Technical Report No. 94-2.

Download full text

Zwick, Rebecca; Thayer, Dorothy T. – 1994

Several recent studies have investigated the application of statistical inference procedures to the analysis of differential item functioning (DIF) in test items that are scored on an ordinal scale. Mantel's extension of the Mantel-Haenszel test is a possible hypothesis-testing method for this purpose. The development of descriptive statistics for…

Descriptors: Error of Measurement, Evaluation Methods, Hypothesis Testing, Item Bias

Interrater Agreement: Same Data, Different Definitions, Different Outcomes.

Download full text

Micceri, Theodore; And Others – 1987

Several issues relating to agreement estimates for different types of data from performance evaluations are considered. New indices of agreement are presented for ordinal level items and for summative scores produced by nominal or ordinal level items. Two sets of empirical data illustrate the performance of the two formulas derived to estimate…

Descriptors: Correlation, Data Analysis, Educational Research, Estimation (Mathematics)

Previous Page | Next Page »

Pages: 1 | 2

Gierl, Mark J.	2
Sireci, Stephen G.	2
Wu, Margaret	2
Ainley, John	1
Avery, Marybell	1
Chan, David W.	1
Chen, Hanwei	1
Cook, Linda L.	1
Cui, Zhongmin	1
Dancer, L. Suzanne	1
DeMars, Christine E.	1
Deng, Hui	1
Donovan, Jenny	1
Dyson, Ben	1
Fang, Yu	1
Ferrini-Mundy, Joan	1
Fisette, Jennifer L.	1
Floden, Robert E.	1
Fox, Connie	1
Fraillon, Julian	1
Franck, Marian	1
Frisbie, David A.	1
Geisinger, Kurt F.	1
Graber, Kim C.	1
More ▼