ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	0
Since 2007 (last 20 years)	4

Descriptor

Correlation	5
Educational Testing	5
Testing Problems	5
Educational Assessment	3
Measurement	3
Achievement Tests	2
Educational Policy	2
Evaluation Methods	2
Evaluation Problems	2
Item Response Theory	2
Psychometrics	2
Test Interpretation	2
Test Items	2
Test Reliability	2
Academic Achievement	1
Access to Education	1
Accountability	1
Cognitive Tests	1
College Entrance Examinations	1
Computer Software	1
Construct Validity	1
Culture Fair Tests	1
Data Analysis	1
Definitions	1
Diagnostic Tests	1
More ▼

Source

American Educational Research…	1
Journal of Educational…	1
Online Submission	1
Review of Research in…	1

Author

Cui, Ying	1
Hopkins, Kenneth D.	1
Jiao, Hong	1
Jin, Ying	1
Leighton, Jacqueline P.	1
Papay, John P.	1
Thum, Yeow Meng	1
Wang, Shudong	1
Wiliam, Dylan	1

Publication Type

Journal Articles	3
Reports - Evaluative	3
Reports - Research	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	3
Elementary Education	1
Grade 3	1
Grade 4	1
Grade 5	1

Audience

Location

United Kingdom	1
United States	1

Laws, Policies, & Programs

Individuals with Disabilities…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

SAT (College Admission Test)	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing all 5 results Save | Export

Investigating Effect of Ignoring Hierarchical Data Structures on Accuracy of Vertical Scaling Using Mixed-Effects Rasch Model

Download full text

Wang, Shudong; Jiao, Hong; Jin, Ying; Thum, Yeow Meng – Online Submission, 2010

The vertical scales of large-scale achievement tests created by using item response theory (IRT) models are mostly based on cluster (or correlated) educational data in which students usually are clustered in certain groups or settings (classrooms or schools). While such application directly violated assumption of independent sample of person in…

Descriptors: Scaling, Achievement Tests, Data Analysis, Item Response Theory

Different Tests, Different Answers: The Stability of Teacher Value-Added Estimates across Outcome Measures

Peer reviewed

Direct link

Papay, John P. – American Educational Research Journal, 2011

Recently, educational researchers and practitioners have turned to value-added models to evaluate teacher performance. Although value-added estimates depend on the assessment used to measure student achievement, the importance of outcome selection has received scant attention in the literature. Using data from a large, urban school district, I…

Descriptors: Urban Schools, Teacher Effectiveness, Reading Achievement, Achievement Tests

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

What Counts as Evidence of Educational Achievement? The Role of Constructs in the Pursuit of Equity in Assessment

Peer reviewed

Direct link

Wiliam, Dylan – Review of Research in Education, 2010

The idea that validity should be considered a property of inferences, rather than of assessments, has developed slowly over the past century. In early writings about the validity of educational assessments, validity was defined as a property of an assessment. The most common definition was that an assessment was valid to the extent that it…

Descriptors: Educational Assessment, Validity, Inferences, Construct Validity

The Stability and Change of Language and Non-Language IQ Scores. Final Report.

Download full text

Hopkins, Kenneth D. – 1971

The stability and change of verbal, non-verbal, and total IQ scores from group tests were investigated for students tested at Grades 1, 2, 4, 7, 9, and 11. Conclusions are discussed in detail, and the following recommendations are made: (1) Group intelligence tests should not be routinely administered in Grades 1 and 2 unless the users and…

Descriptors: Correlation, Educational Testing, Factor Analysis, Group Testing