Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 11 |
Descriptor
Difficulty Level | 21 |
Test Bias | 21 |
Test Validity | 21 |
Test Items | 16 |
Item Analysis | 12 |
Test Construction | 8 |
Test Reliability | 7 |
Foreign Countries | 5 |
Models | 5 |
Comparative Analysis | 4 |
Culture Fair Tests | 4 |
More ▼ |
Source
Author
Robertson, David W. | 2 |
Baghaei, Purya | 1 |
Bob delMas | 1 |
Brad Hartlaub | 1 |
Brown, Ted | 1 |
Catherine Case | 1 |
Chien, Chi-Wen | 1 |
Daniel M. Bolt | 1 |
Douglas Whitaker | 1 |
Drasgow, Fritz | 1 |
Ercikan, Kadriye | 1 |
More ▼ |
Publication Type
Reports - Research | 16 |
Journal Articles | 9 |
Speeches/Meeting Papers | 4 |
Reports - Descriptive | 2 |
Reports - Evaluative | 2 |
Information Analyses | 1 |
Opinion Papers | 1 |
Education Level
Elementary Education | 3 |
Secondary Education | 3 |
Grade 4 | 2 |
Higher Education | 2 |
Elementary Secondary Education | 1 |
Grade 8 | 1 |
Intermediate Grades | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Postsecondary Education | 1 |
Audience
Administrators | 1 |
Community | 1 |
Parents | 1 |
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Tim Jacobbe; Bob delMas; Brad Hartlaub; Jeff Haberstroh; Catherine Case; Steven Foti; Douglas Whitaker – Numeracy, 2023
The development of assessments as part of the funded LOCUS project is described. The assessments measure students' conceptual understanding of statistics as outlined in the GAISE PreK-12 Framework. Results are reported from a large-scale administration to 3,430 students in grades 6 through 12 in the United States. Items were designed to assess…
Descriptors: Statistics Education, Common Core State Standards, Student Evaluation, Elementary School Students
Qi Huang; Daniel M. Bolt; Weicong Lyu – Large-scale Assessments in Education, 2024
Large scale international assessments depend on invariance of measurement across countries. An important consideration when observing cross-national differential item functioning (DIF) is whether the DIF actually reflects a source of bias, or might instead be a methodological artifact reflecting item response theory (IRT) model misspecification.…
Descriptors: Test Items, Item Response Theory, Test Bias, Test Validity
Parry, James R. – Online Submission, 2020
This paper presents research and provides a method to ensure that parallel assessments, that are generated from a large test-item database, maintain equitable difficulty and content coverage each time the assessment is presented. To maintain fairness and validity it is important that all instances of an assessment, that is intended to test the…
Descriptors: Culture Fair Tests, Difficulty Level, Test Items, Test Validity
Sabol, F. Robert – National Art Education Association, 2018
This White Paper provides a selection of some general principles of assessment or overarching ideas that may guide educators in selecting, developing, and implementing assessments of students' learning at all instructional levels or educational settings in which they are used. These principles represent a framework for understanding the nature of…
Descriptors: Visual Arts, Art Education, Educational Principles, Student Evaluation
Baghaei, Purya; Kubinger, Klaus D. – Practical Assessment, Research & Evaluation, 2015
The present paper gives a general introduction to the linear logistic test model (Fischer, 1973), an extension of the Rasch model with linear constraints on item parameters, along with eRm (an R package to estimate different types of Rasch models; Mair, Hatzinger, & Mair, 2014) functions to estimate the model and interpret its parameters. The…
Descriptors: Item Response Theory, Models, Test Validity, Hypothesis Testing
Shaw, Stuart; Imam, Helen – Language Assessment Quarterly, 2013
International assessments in a wide range of subjects are being prepared for and delivered through the medium of English in a variety of educational contexts. These assessments are taken by many candidates whose first language is not necessarily English. This raises important issues relating to assessment validity and fairness. This study…
Descriptors: English (Second Language), Test Validity, Test Bias, High Stakes Tests
Leong, Samuel; Qiu, Xue-Lan – Educational Psychology, 2013
Having accurate insights of teachers' conceptions of creativity and the role of assessment in arts education would inform education policy, training programmes and the measurement of learning outcomes. Yet no study has been found to examine the relationship between teachers' conceptions of creativity and their conceptions of assessment in arts…
Descriptors: Foreign Countries, Creativity, Art Education, Teacher Attitudes
Sandilands, Debra; Oliveri, Maria Elena; Zumbo, Bruno D.; Ercikan, Kadriye – International Journal of Testing, 2013
International large-scale assessments of achievement often have a large degree of differential item functioning (DIF) between countries, which can threaten score equivalence and reduce the validity of inferences based on comparisons of group performances. It is important to understand potential sources of DIF to improve the validity of future…
Descriptors: Validity, Measures (Individuals), International Studies, Foreign Countries
Rasch Analysis of the Assessment of Children's Hand Skills in Children with and without Disabilities
Chien, Chi-Wen; Brown, Ted; McDonald, Rachael – Research in Developmental Disabilities: A Multidisciplinary Journal, 2011
The Assessment of Children's Hand Skills (ACHS) is a new assessment tool that utilizes a naturalistic observational method to capture children's real-life hand skill performance when engaging in various types of activities. The ACHS also intends to be used with both typically developing children and those presenting with disabilities. The purpose…
Descriptors: Test Items, Construct Validity, Test Bias, Disabilities
Razi, Salim – Online Submission, 2012
This study presents the processes of developing and establishing reliability and validity of a reading test by administering an integrative approach as conventional reliability and validity measures superficially reveals the difficulty of a reading test. In this respect, analysing vocabulary frequency of the test is regarded as a more eligible way…
Descriptors: Foreign Countries, Undergraduate Students, Reading Tests, Test Validity
Tristan, Agustin; Vidal, Rafael – Online Submission, 2007
Wright and Stone had proposed three features to assess the quality of the distribution of the items difficulties in a test, on the so called "most probable response map": line, stack and gap. Once a line is accepted as a design model for a test, gaps and stacks are practically eliminated, producing an evidence of the "scale…
Descriptors: Test Validity, Models, Difficulty Level, Test Items
Gonzalez-Tamayo, Eulogio – 1984
An item content criterion, independent of the test psychometric characteristics, of classifying items as biased is described. It was used with a sample of female adults in a training program for administrative secretaries. The minority group in the study were Hispanic immigrants. The majority group was a mixture of Blacks, English speaking…
Descriptors: Difficulty Level, Hispanic Americans, Item Analysis, Language Dominance

Linn, Robert L.; Drasgow, Fritz – Educational Measurement: Issues and Practice, 1987
This article discusses the application of the Golden Rule procedure to items of the Scholastic Aptitude Test. Using item response theory, the analyses indicate that the Golden Rule procedures are ineffective in detecting biased items and may undermine the reliability and validity of tests. (Author/JAZ)
Descriptors: College Entrance Examinations, Difficulty Level, Item Analysis, Latent Trait Theory
Robertson, David W.; And Others – 1977
A comparative study of item analysis was conducted on the basis of race to determine whether alternative test construction or processing might increase the proportion of black enlisted personnel among those passing various military technical knowledge examinations. The study used data from six specialists at four grade levels and investigated item…
Descriptors: Difficulty Level, Enlisted Personnel, Item Analysis, Occupational Tests
Ironson, Gail H. – 1978
Four statistical methods for identifying biased test items were used with data from two ethnic groups (1,691 black and 1,794 white high school seniors). The data were responses to 150 items in five subtests including two traditional tests (reading and mathematics) and three nontraditional tests (picture number test of associative memory, letter…
Descriptors: Aptitude Tests, Comparative Analysis, Culture Fair Tests, Difficulty Level
Previous Page | Next Page ยป
Pages: 1 | 2