Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 42 |
| Since 2017 (last 10 years) | 98 |
| Since 2007 (last 20 years) | 249 |
Descriptor
| Testing | 249 |
| Test Reliability | 164 |
| Test Validity | 124 |
| Reliability | 76 |
| Test Construction | 71 |
| Foreign Countries | 68 |
| Scoring | 64 |
| Scores | 55 |
| Language Tests | 44 |
| Validity | 39 |
| Psychometrics | 38 |
| More ▼ | |
Source
Author
| McCrimmon, Adam W. | 6 |
| Goldschmidt, Pete | 3 |
| Ackerman, Debra J. | 2 |
| Delen, Erhan | 2 |
| Dickens, Rachel H. | 2 |
| Dorans, Neil J. | 2 |
| Dunne, Michael P. | 2 |
| Hamid, M. Obaidul | 2 |
| Heritage, Margaret | 2 |
| Herman, Joan L. | 2 |
| Hurford, David P. | 2 |
| More ▼ | |
Publication Type
Education Level
Location
| New York | 10 |
| Canada | 6 |
| Illinois | 6 |
| Turkey | 6 |
| China | 5 |
| United Kingdom (England) | 5 |
| United States | 5 |
| Australia | 4 |
| Florida | 4 |
| Maryland | 4 |
| United Kingdom | 4 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Augustin Mutak; Robert Krause; Esther Ulitzsch; Sören Much; Jochen Ranger; Steffi Pohl – Journal of Educational Measurement, 2024
Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating…
Descriptors: Testing, Academic Ability, Time on Task, Correlation
Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021
Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023
Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…
Descriptors: Chemistry, Periodicals, Journal Articles, Science Education
Gökhan Iskifoglu – Turkish Online Journal of Educational Technology - TOJET, 2024
This research paper investigated the importance of conducting measurement invariance analysis in developing measurement tools for assessing differences between and among study variables. Most of the studies, which tended to develop an inventory to assess the existence of an attitude, behavior, belief, IQ, or an intuition in a person's…
Descriptors: Testing, Testing Problems, Error of Measurement, Attitude Measures
Amanda A. Wolkowitz; Russell Smith – Practical Assessment, Research & Evaluation, 2024
A decision consistency (DC) index is an estimate of the consistency of a classification decision on an exam. More specifically, DC estimates the percentage of examinees that would have the same classification decision on an exam if they were to retake the same or a parallel form of the exam again without memory of taking the exam the first time.…
Descriptors: Testing, Test Reliability, Replication (Evaluation), Decision Making
Li, Minzi; Zhang, Xian – Language Testing, 2021
This meta-analysis explores the correlation between self-assessment (SA) and language performance. Sixty-seven studies with 97 independent samples involving more than 68,500 participants were included in our analysis. It was found that the overall correlation between SA and language performance was 0.466 (p < 0.01). Moderator analysis was…
Descriptors: Meta Analysis, Self Evaluation (Individuals), Likert Scales, Research Reports
Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…
Descriptors: Value Added Models, Tests, Testing, Scoring
Saenz, David Arron – Online Submission, 2023
There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…
Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics
McLeod, Justin W.H.; McCrimmon, Adam W. – Journal of Psychoeducational Assessment, 2021
The "Raven's 2 Progressive Matrices Clinical Edition" (Raven's 2; Raven, Rust, Chan, & Zhou, 2018), published by NCS Pearson, is an individually administered nonverbal assessment of general cognitive ability developed to measure "educative abilities," defined as the ability to think clearly and solve complex problems in…
Descriptors: Test Reviews, Intelligence Tests, Testing, Test Reliability
Tingting Li; Kevin Haudek; Joseph Krajcik – Journal of Science Education and Technology, 2025
Scientific modeling is a vital educational practice that helps students apply scientific knowledge to real-world phenomena. Despite advances in AI, challenges in accurately assessing such models persist, primarily due to the complexity of cognitive constructs and data imbalances in educational settings. This study addresses these challenges by…
Descriptors: Artificial Intelligence, Scientific Concepts, Models, Automation
Susan K. Johnsen – Gifted Child Today, 2024
The author provides a checklist for educators who are selecting technically adequate tests for identifying and referring students for gifted education services and programs. The checklist includes questions related to how the test was normed, reliability and validity studies as well as questions related to types of scores, administration, and…
Descriptors: Test Selection, Academically Gifted, Gifted Education, Test Validity
Venessa F. Manna; Shuhong Li; Spiros Papageorgiou; Lixiong Gu – ETS Research Report Series, 2025
This technical manual describes the purpose and intended uses of the TOEFL iBT test, its target test-taker population, and relevant language use domains. The test design and scoring procedures are presented first, followed by a research agenda intended to support the interpretation and use of test scores. Given the updates to the test starting…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Test Construction
Rosziati Ibrahim; Mizani Mohamad Madon; Zhiang Yue Lee; Piraviendran A/L Rajendran; Jahari Abdul Wahab; Faaizah Shahbodin – International Society for Technology, Education, and Science, 2023
This paper discusses the steps involve in project development for developing the mobile application, namely Blood Bank Application and developing the convertor for software testing. The project development is important for Computer Science students for them to learn the important steps in developing the application and testing the reliability of…
Descriptors: Program Administration, Educational Technology, Computer Software, Testing
Anne Wicks; Robin Berkley – George W. Bush Institute, 2025
Assessments are one of the most important--and often misunderstood--elements of education. In most cases, tests are administered by the state as well as by districts and schools. Assessments at each of these levels have distinct purposes, yield different information, and are part of a powerful, coordinated approach to improving student outcomes.…
Descriptors: Student Evaluation, Testing, Tests, Standardized Tests
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Hurford, David P.; Wines, Autumn – Australian Journal of Learning Difficulties, 2022
The purpose of the present study was to examine the potential that parents could effectively administer an online dyslexia evaluation tool (ODET) to their children. To this end, four groups consisting of parents and trained staff were compared. Sixty-three children (36 females and 27 males) participated. The children in each group were assessed…
Descriptors: Test Reliability, Computer Assisted Testing, Dyslexia, Screening Tests

Peer reviewed
Direct link
