Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 13 |
| Since 2017 (last 10 years) | 39 |
| Since 2007 (last 20 years) | 61 |
Descriptor
| Scoring | 104 |
| Test Items | 104 |
| Test Validity | 104 |
| Test Reliability | 70 |
| Test Construction | 55 |
| Psychometrics | 27 |
| Testing | 24 |
| Item Analysis | 21 |
| Item Response Theory | 20 |
| English (Second Language) | 16 |
| Language Tests | 16 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 2 |
| Researchers | 1 |
| Teachers | 1 |
Location
| Nebraska | 5 |
| New Mexico | 3 |
| Alabama | 2 |
| California | 2 |
| New York | 2 |
| Canada | 1 |
| China | 1 |
| Europe | 1 |
| Idaho | 1 |
| Iran | 1 |
| Israel | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Comprehensive Education… | 1 |
| Individuals with Disabilities… | 1 |
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Venessa F. Manna; Shuhong Li; Spiros Papageorgiou; Lixiong Gu – ETS Research Report Series, 2025
This technical manual describes the purpose and intended uses of the TOEFL iBT test, its target test-taker population, and relevant language use domains. The test design and scoring procedures are presented first, followed by a research agenda intended to support the interpretation and use of test scores. Given the updates to the test starting…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Test Construction
Fergadiotis, Gerasimos; Casilio, Marianne; Dickey, Michael Walsh; Steel, Stacey; Nicholson, Hannele; Fleegle, Mikala; Swiderski, Alexander; Hula, William D. – Journal of Speech, Language, and Hearing Research, 2023
Purpose: Item response theory (IRT) is a modern psychometric framework with several advantageous properties as compared with classical test theory. IRT has been successfully used to model performance on anomia tests in individuals with aphasia; however, all efforts to date have focused on noun production accuracy. The purpose of this study is to…
Descriptors: Item Response Theory, Psychometrics, Verbs, Naming
Hilliard, Airlie; Kazim, Emre; Bitsakis, Theodoros; Leutner, Franziska – Journal of Intelligence, 2022
Selection methods are commonly used in talent acquisition to predict future job performance and to find the best candidates, but questionnaire-based assessments can be lengthy and lead to candidate fatigue and poor engagement, affecting completion rates and producing poor data. Gamification can mitigate some of these issues through greater…
Descriptors: Personality Measures, Personality Traits, Gamification, Imagery
Sachin Nedungadi; Corina E. Brown; Sue Hyeon Paek – Journal of Chemical Education, 2022
The Fundamental Concepts for Organic Reaction Mechanisms Inventory (FC-ORMI) is a concept inventory with most items in a two-tier design in which an answer tier is followed by a reasoning tier. Statistical results provided strong evidence for the validity and reliability of the data obtained using the FC-ORMI. In this study, differential item…
Descriptors: Test Bias, Test Validity, Test Reliability, Gender Differences
Laila El-Hamamsy; María Zapata-Cáceres; Estefanía Martín-Barroso; Francesco Mondada; Jessica Dehler Zufferey; Barbara Bruno; Marcos Román-González – Technology, Knowledge and Learning, 2025
The introduction of computing education into curricula worldwide requires multi-year assessments to evaluate the long-term impact on learning. However, no single Computational Thinking (CT) assessment spans primary school, and no group of CT assessments provides a means of transitioning between instruments. This study therefore investigated…
Descriptors: Cognitive Tests, Computation, Thinking Skills, Test Validity
Sara T. Cushing – ETS Research Report Series, 2025
This report provides an in-depth comparison of TOEFL iBT® and the Duolingo English Test (DET) in terms of the degree to which both tests assess academic language proficiency in listening, reading, writing, and speaking. The analysis is based on publicly available documentation on both tests, including sample test questions available on the test…
Descriptors: Language Tests, English (Second Language), Second Language Learning, Academic Language
Güntay Tasçi – Science Insights Education Frontiers, 2024
The present study has aimed to develop and validate a protein concept inventory (PCI) consisting of 25 multiple-choice (MC) questions to assess students' understanding of protein, which is a fundamental concept across different biology disciplines. The development process of the PCI involved a literature review to identify protein-related content,…
Descriptors: Science Instruction, Science Tests, Multiple Choice Tests, Biology
Guo, Hongwen; Ling, Guangming; Frankel, Lois – ETS Research Report Series, 2020
With advances in technology, researchers and test developers are developing new item types to measure complex skills like problem solving and critical thinking. Analyzing such items is often challenging because of their complicated response patterns, and thus it is important to develop psychometric methods for practitioners and researchers to…
Descriptors: Test Construction, Test Items, Item Analysis, Psychometrics
Selcuk Acar; Denis Dumas; Peter Organisciak; Kelly Berthiaume – Grantee Submission, 2024
Creativity is highly valued in both education and the workforce, but assessing and developing creativity can be difficult without psychometrically robust and affordable tools. The open-ended nature of creativity assessments has made them difficult to score, expensive, often imprecise, and therefore impractical for school- or district-wide use. To…
Descriptors: Thinking Skills, Elementary School Students, Artificial Intelligence, Measurement Techniques
Lynch, Sarah – Practical Assessment, Research & Evaluation, 2022
In today's digital age, tests are increasingly being delivered on computers. Many of these computer-based tests (CBTs) have been adapted from paper-based tests (PBTs). However, this change in mode of test administration has the potential to introduce construct-irrelevant variance, affecting the validity of score interpretations. Because of this,…
Descriptors: Computer Assisted Testing, Tests, Scores, Scoring
Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019
This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…
Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring
International Journal of Testing, 2018
The second edition of the International Test Commission Guidelines for Translating and Adapting Tests was prepared between 2005 and 2015 to improve upon the first edition, and to respond to advances in testing technology and practices. The 18 guidelines are organized into six categories to facilitate their use: pre-condition (3), test development…
Descriptors: Translation, Test Construction, Testing, Scoring
von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023
Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…
Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education
Ji-young Shin – ProQuest LLC, 2021
The present dissertation investigated the impact of scales/scoring methods and prompt linguistic features on the measurement quality of L2 English elicited imitation (EI). Scales/scoring methods are an important feature for the validity and reliability of L2 EI test, but less is known (Yan et al., 2016). Prompt linguistic features are also known…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Semantics
Warsono; Nursuhud, Puji Iman; Darma, Rio Sandhika; Supahar – International Journal of Instruction, 2020
The study was conducted to analyze the items about the ability of high school students diagram representation and obtain Item Curve Characteristic. Grid test instruments are compiled based on competencies and indicators of diagram representation which are then used to compile items. The test instrument consisted of five items and was validated by…
Descriptors: High School Students, Problem Solving, Visual Aids, Scoring

Peer reviewed
Direct link
