ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	14

Descriptor

Comparative Analysis	28
Test Bias	28
Test Reliability	28
Test Validity	16
Statistical Analysis	6
Test Items	6
Item Response Theory	5
Scores	5
Standardized Tests	5
Test Construction	5
Evaluation Methods	4
Higher Education	4
Item Analysis	4
Monte Carlo Methods	4
Multiple Choice Tests	4
Adaptive Testing	3
Blacks	3
College Entrance Examinations	3
Cultural Differences	3
Elementary School Students	3
Ethnicity	3
Foreign Countries	3
Mathematical Models	3
Maximum Likelihood Statistics	3
Models	3
More ▼

Source

Advances in Health Sciences…	2
ETS Research Report Series	2
ProQuest LLC	2
Western Journal of Speech…	2
Educational and Psychological…	1
Hispanic Journal of…	1
International Journal of…	1
International Research in…	1
Journal of Autism and…	1
Journal of Chemical Education	1
Journal of Educational Issues	1
Journal of Educational…	1
Journal of Negro Education	1
Journal of Special Education	1
Research in Developmental…	1
Sociological Methods &…	1
Southern Journal of…	1
Special Services in the…	1
More ▼

Publication Type

Journal Articles	20
Reports - Research	18
Reports - Evaluative	5
Dissertations/Theses -…	2
Information Analyses	2
Collected Works - Serials	1
Opinion Papers	1
Speeches/Meeting Papers	1

Education Level

Higher Education	3
Elementary Education	2
Postsecondary Education	2
Early Childhood Education	1

Audience

Policymakers

Location

Australia	1
Illinois	1
Pennsylvania	1
Sweden	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Wechsler Intelligence Scale…	2
Childrens Manifest Anxiety…	1
Graduate Record Examinations	1
Group Assessment of Logical…	1
National Teacher Examinations	1
Peabody Picture Vocabulary…	1
Personality Research Form	1
Raven Progressive Matrices	1
SAT (College Admission Test)	1
Stanford Binet Intelligence…	1
Wonderlic Personnel Test	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

Aggregation Bias and the Analysis of Necessary and Sufficient Conditions in fsQCA

Peer reviewed

Direct link

Braumoeller, Bear F. – Sociological Methods & Research, 2017

Fuzzy-set qualitative comparative analysis (fsQCA) has become one of the most prominent methods in the social sciences for capturing causal complexity, especially for scholars with small- and medium-"N" data sets. This research note explores two key assumptions in fsQCA's methodology for testing for necessary and sufficient…

Descriptors: Qualitative Research, Comparative Analysis, Social Science Research, Research Methodology

Hidden Item Variance in Multiple Mini-Interview Scores

Peer reviewed

Direct link

Zaidi, Nikki L.; Swoboda, Christopher M.; Kelcey, Benjamin M.; Manuel, R. Stephen – Advances in Health Sciences Education, 2017

The extant literature has largely ignored a potentially significant source of variance in multiple mini-interview (MMI) scores by "hiding" the variance attributable to the sample of attributes used on an evaluation form. This potential source of hidden variance can be defined as rating items, which typically comprise an MMI evaluation…

Descriptors: Interviews, Scores, Generalizability Theory, Monte Carlo Methods

Multiple True-False Items: A Comparison of Scoring Algorithms

Peer reviewed

Direct link

Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018

Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…

Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests

Is the Autism-Spectrum Quotient a Valid Measure of Traits Associated with the Autism Spectrum? A Rasch Validation in Adults with and without Autism Spectrum Disorders

Peer reviewed

Direct link

Lundqvist, Lars-Olov; Lindner, Helen – Journal of Autism and Developmental Disorders, 2017

The Autism-Spectrum Quotient (AQ) is among the most widely used scales assessing autistic traits in the general population. However, some aspects of the AQ are questionable. To test its scale properties, the AQ was translated into Swedish, and data were collected from 349 adults, 130 with autism spectrum disorder (ASD) and 219 without ASD, and…

Descriptors: Autism, Pervasive Developmental Disorders, Adults, Comparative Analysis

DIF Analysis with Multilevel Data: A Simulation Study Using the Latent Variable Approach

Peer reviewed
PDF on ERIC

Download full text

Jin, Ying; Eason, Hershel – Journal of Educational Issues, 2016

The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling…

Descriptors: Test Bias, Simulation, Hierarchical Linear Modeling, Comparative Analysis

Guidelines versus Practices in Cross-Lingual Assessment: A Disconcerting Disconnect

Peer reviewed

Direct link

Rios, Joseph A.; Sireci, Stephen G. – International Journal of Testing, 2014

The International Test Commission's "Guidelines for Translating and Adapting Tests" (2010) provide important guidance on developing and evaluating tests for use across languages. These guidelines are widely applauded, but the degree to which they are followed in practice is unknown. The objective of this study was to perform a…

Descriptors: Guidelines, Translation, Adaptive Testing, Second Languages

The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

Peer reviewed
PDF on ERIC

Download full text

Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

Descriptors: Test Format, Test Items, Responses, Computation

Australian Indigenous Students' Performance on the PIPS-BLA Reading and Mathematics Scales: 2011-2013

Peer reviewed
PDF on ERIC

Download full text

Styles, Irene; Wildy, Helen; Pepper, Vivienne; Faulkner, Joanne; Berman, Ye'Elah – International Research in Early Childhood Education, 2014

The assessment of literacy and numeracy skills of students as they enter school for the first time is not yet established nation-wide in Australia. However, a large proportion of primary schools have chosen to assess their starting students on the Performance Indicators in Primary Schools-Baseline Assessment (PIPS-BLA). This series of three…

Descriptors: Foreign Countries, Indigenous Knowledge, Performance Based Assessment, Test Bias

Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

Peer reviewed
PDF on ERIC

Download full text

Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

Descriptors: Test Bias, Item Response Theory, Test Items, Scores

The Nature of Science Instrument-Elementary (NOSI-E): Using Rasch Principles to Develop a Theoretically Grounded Scale to Measure Elementary Student Understanding of the Nature of Science

Direct link

Peoples, Shelagh – ProQuest LLC, 2012

The purpose of this study was to determine which of three competing models will provide, reliable, interpretable, and responsive measures of elementary students' understanding of the nature of science (NOS). The Nature of Science Instrument-Elementary (NOSI-E), a 28-item Rasch-based instrument, was used to assess students' NOS…

Descriptors: Scientific Principles, Science Tests, Elementary School Students, Item Response Theory

Comparing Two Tests of Formal Reasoning in a College Chemistry Context

Peer reviewed

Direct link

Jiang, Bo; Xu, Xiaoying; Garcia, Alicia; Lewis, Jennifer E. – Journal of Chemical Education, 2010

The Test of Logical Thinking (TOLT) and the Group Assessment of Logical Thinking (GALT) are two of the instruments most widely used by science educators and researchers to measure students' formal reasoning abilities. Based on Piaget's cognitive development theory, formal thinking ability has been shown to be essential for student achievement in…

Descriptors: Test Bias, Test Reliability, Chemistry, Logical Thinking

A Rasch-Based Validation of the Hooper Visual Organization Test in Chinese-Speaking Children

Peer reviewed

Direct link

Wuang, Yee-Pay; Wang, Li-Chen; Su, Chwen-Yng – Research in Developmental Disabilities: A Multidisciplinary Journal, 2010

The aim of this study was to examine the validation of the Hooper Visual Organization Test (HVOT) for use in children by testing for item fit, unidimensionality, item hierarchy, reliability, and screening capacity. A modified scoring system was devised for the HVOT so that children received some credit for being able to describe the function of…

Descriptors: Test Bias, Down Syndrome, Scoring, Item Response Theory

Small-Sample Equating Using a Synthetic Linking Function

Peer reviewed

Direct link

Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – Journal of Educational Measurement, 2008

This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically,…

Descriptors: Equated Scores, Sample Size, Test Reliability, Comparative Analysis

The Development of a Digital Logic Concept Inventory

Direct link

Herman, Geoffrey Lindsay – ProQuest LLC, 2011

Instructors in electrical and computer engineering and in computer science have developed innovative methods to teach digital logic circuits. These methods attempt to increase student learning, satisfaction, and retention. Although there are readily accessible and accepted means for measuring satisfaction and retention, there are no widely…

Descriptors: Grounded Theory, Delphi Technique, Concept Formation, Misconceptions

A Comparison of WISC and WISC-R Results among Black Elementary Students.

Angstadt, Al; And Others – Southern Journal of Educational Research, 1979

Seeking to compare the original Wechler Intelligence Scale (WISC) with its revised version, the WISC-R, this study compared WISC-R scores of 50 Black children with their WISC scores taken two years previously. Mean scores on the WISC-R were lower on the Verbal Scale, Performance Scale, and Full Scale. (DS)

Descriptors: Black Education, Black Students, Comparative Analysis, Elementary Education

Previous Page | Next Page »

Pages: 1 | 2

Bennett, Randy Elliot	3
Angstadt, Al	1
Argulewicz, Ed N.	1
Avila, Dolores R.	1
Bauer, Daniel	1
Berman, Ye'Elah	1
Braumoeller, Bear F.	1
Brown, R. L.	1
Eason, Hershel	1
Faulkner, Joanne	1
Fischer, Martin R.	1
Garcia, Alicia	1
Guttormsen, Sissel	1
Haberman, Shelby	1
Herman, Geoffrey Lindsay	1
Hood, Stafford	1
Huwendiek, Sören	1
Jackson, Douglas N.	1
Jackson, Ronald	1
Jensen, Arthur R.	1
Jiang, Bo	1
Jin, Ying	1
Jones, Joan	1
Jongsma, Eugene A.	1
More ▼