Publication Date
In 2025 | 3 |
Since 2024 | 12 |
Since 2021 (last 5 years) | 41 |
Since 2016 (last 10 years) | 126 |
Since 2006 (last 20 years) | 395 |
Descriptor
Test Theory | 1161 |
Test Items | 261 |
Test Reliability | 252 |
Test Construction | 245 |
Test Validity | 245 |
Psychometrics | 181 |
Scores | 176 |
Item Response Theory | 165 |
Foreign Countries | 159 |
Item Analysis | 141 |
Statistical Analysis | 134 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
United States | 17 |
United Kingdom (England) | 15 |
Canada | 14 |
Australia | 13 |
Turkey | 12 |
Sweden | 8 |
United Kingdom | 8 |
Netherlands | 7 |
Texas | 7 |
New York | 6 |
Taiwan | 6 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 4 |
Elementary and Secondary… | 3 |
Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Penfield, Randall D. – Applied Psychological Measurement, 2010
Crossing, or intersecting, differential item functioning (DIF) is a form of nonuniform DIF that exists when the sign of the between-group difference in expected item performance changes across the latent trait continuum. The presence of crossing DIF presents a problem for many statistics developed for evaluating DIF because positive and negative…
Descriptors: Test Bias, Test Items, Statistics, Test Theory
Baird, Jo-Anne; Black, Paul – Research Papers in Education, 2013
Much has already been written on the controversies surrounding the use of different test theories in educational assessment. Other authors have noted the prevalence of classical test theory over item response theory in practice. This Special Issue draws together articles based upon work conducted on the Reliability Programme for England's…
Descriptors: Test Theory, Foreign Countries, Test Reliability, Item Response Theory
Rao, Vasanthi – ProQuest LLC, 2012
In 1997, based on the amendments to Individuals with Disabilities Education Act (IDEA), all states were faced with a statutory requirement to develop and implement alternate assessments for students with disabilities unable to participate in the statewide large-scale assessment. States were given the challenge of creating, implementing, and…
Descriptors: Alternative Assessment, Psychometrics, Item Response Theory, Models
Bailey, Janelle M.; Johnson, Bruce; Prather, Edward E.; Slater, Timothy F. – International Journal of Science Education, 2012
Concept inventories (CIs)--typically multiple-choice instruments that focus on a single or small subset of closely related topics--have been used in science education for more than a decade. This paper describes the development and validation of a new CI for astronomy, the "Star Properties Concept Inventory" (SPCI). Questions cover the areas of…
Descriptors: Educational Strategies, Validity, Testing, Astronomy
Kingsley, Laurie; Romine, William – European Journal of Educational Research, 2014
Schools and teacher induction programs around the world routinely assess teaching best practice to inform accreditation, tenure/promotion, and professional development decisions. Routine assessment is also necessary to ensure that teachers entering the profession get the assistance they need to develop and succeed. We introduce the Item-Level…
Descriptors: Test Construction, Test Validity, Beginning Teacher Induction, Best Practices
Development of Nonword and Irregular Word Lists for Australian Grade 3 Students Using Rasch Analysis
Callinan, Sarah; Cunningham, Everarda; Theiler, Stephen – Australian Journal of Learning Difficulties, 2014
Many tests used in educational settings to identify learning difficulties endeavour to pick up only the lowest performers. Yet these tests are generally developed within a Classical Test Theory (CTT) paradigm that assumes that data do not have significant skew. Rasch analysis is more tolerant of skew and was used to validate two newly developed…
Descriptors: Foreign Countries, Reading Tests, Item Response Theory, Elementary School Students
Royal, Kenneth D.; Gilliland, Kurt O.; Kernick, Edward T. – Anatomical Sciences Education, 2014
Any examination that involves moderate to high stakes implications for examinees should be psychometrically sound and legally defensible. Currently, there are two broad and competing families of test theories that are used to score examination data. The majority of instructors outside the high-stakes testing arena rely on classical test theory…
Descriptors: Item Response Theory, Scoring, Evaluation Methods, Anatomy
Chen, Haiwen; Holland, Paul – Psychometrika, 2010
In this paper, we develop a new curvilinear equating for the nonequivalent groups with anchor test (NEAT) design under the assumption of the classical test theory model, that we name curvilinear Levine observed score equating. In fact, by applying both the kernel equating framework and the mean preserving linear transformation of…
Descriptors: Equated Scores, Test Theory, Test Construction, Guidelines
Brennan, Robert L. – Applied Measurement in Education, 2011
Broadly conceived, reliability involves quantifying the consistencies and inconsistencies in observed scores. Generalizability theory, or G theory, is particularly well suited to addressing such matters in that it enables an investigator to quantify and distinguish the sources of inconsistencies in observed scores that arise, or could arise, over…
Descriptors: Generalizability Theory, Test Theory, Test Reliability, Item Response Theory
Mislevy, Robert J. – Educational Measurement: Issues and Practice, 2012
This article presents the author's observations on Neil Dorans's NCME Career Award Address: "The Contestant Perspective on Taking Tests: Emanations from the Statue within." He calls attention to some points that Dr. Dorans made in his address, and offers his thoughts in response.
Descriptors: Testing, Test Reliability, Psychometrics, Scores
Agus, Mirian; Penna, Maria Pietronilla; Peró-Cebollero, Maribel; Guàrdia-Olmos, Joan – EURASIA Journal of Mathematics, Science & Technology Education, 2016
Research on the graphical facilitation of probabilistic reasoning has been characterised by the effort expended to identify valid assessment tools. The authors developed an assessment instrument to compare reasoning performances when problems were presented in verbal-numerical and graphical-pictorial formats. A sample of undergraduate psychology…
Descriptors: Probability, Abstract Reasoning, Thinking Skills, Educational Assessment
Barbera, Jack – Journal of Chemical Education, 2013
The Chemical Concepts Inventory (CCI) is a multiple-choice instrument
designed to assess the alternate conceptions of students in high school or first-semester college chemistry. The instrument was published in 2002 along with an analysis of its data from a test population. This study supports the initial analysis and expands on the psychometric…
Descriptors: Science Instruction, Secondary School Science, High Schools, College Science
Kelcey, Ben; McGinn, Daniel; Hill, Heather – Society for Research on Educational Effectiveness, 2013
Recent policy has charged schools and districts with maintaining highly qualified teachers and differentiating among teachers in terms of their effectiveness (U.S. Department of Education, 2009). This emphasis has driven the development and implementation of teacher quality measures which are increasingly being used to evaluate teachers with…
Descriptors: Teacher Effectiveness, Measures (Individuals), Observation, Teacher Evaluation
Berk, Ronald A. – Journal of Faculty Development, 2013
One of the simplest indicators of teaching or course effectiveness is student ratings on one or more global items from the entire rating scale. That approach seems intuitively sound and easy to use. Global items have even been recommended by a few researchers to get a quick-read, at-a-glance summary for summative decisions about faculty. The…
Descriptors: Rating Scales, Student Evaluation of Teacher Performance, Item Analysis, Test Items
Yelboga, Atilla; Tavsancil, Ezel – Educational Sciences: Theory and Practice, 2010
In this research, the classical test theory and generalizability theory analyses were carried out with the data obtained by a job performance scale for the years 2005 and 2006. The reliability coefficients obtained (estimated) from the classical test theory and generalizability theory analyses were compared. In classical test theory, test retest…
Descriptors: Test Theory, Generalizability Theory, Job Performance, Measures (Individuals)