Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 10 |
| Since 2017 (last 10 years) | 20 |
| Since 2007 (last 20 years) | 25 |
Descriptor
| Accuracy | 26 |
| Test Items | 26 |
| Test Validity | 26 |
| Item Response Theory | 10 |
| Foreign Countries | 9 |
| Test Construction | 9 |
| Item Analysis | 8 |
| Psychometrics | 8 |
| English (Second Language) | 6 |
| Language Tests | 6 |
| Scores | 6 |
| More ▼ | |
Source
Author
| Agbenyo, Sheilla | 1 |
| Apichat Khamboonruang | 1 |
| Asquith, Steven | 1 |
| Baghaei, Purya | 1 |
| Bokander, Lars | 1 |
| Bonifay, Wes E. | 1 |
| Boori, Ali Akbar | 1 |
| Bulala, Tapela | 1 |
| Bylund, Emanuel | 1 |
| Camilla M. McMahon | 1 |
| Crisp, Victoria | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 19 |
| Reports - Research | 19 |
| Dissertations/Theses -… | 5 |
| Tests/Questionnaires | 2 |
| Information Analyses | 1 |
| Reports - Descriptive | 1 |
| Reports - Evaluative | 1 |
| Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
| Iran | 2 |
| Japan | 2 |
| Australia | 1 |
| Japan (Tokyo) | 1 |
| Thailand (Bangkok) | 1 |
| Turkey | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Force Concept Inventory | 1 |
| Graduate Record Examinations | 1 |
| Social Skills Improvement… | 1 |
| Test of English as a Foreign… | 1 |
| Trends in International… | 1 |
What Works Clearinghouse Rating
Camilla M. McMahon; Maryellen Brunson McClain; Savannah Wells; Sophia Thompson; Jeffrey D. Shahidullah – Journal of Autism and Developmental Disorders, 2025
Purpose: The goal of the current study was to conduct a substantive validity review of four autism knowledge assessments with prior psychometric support (Gillespie-Lynch in J Autism and Dev Disord 45(8):2553-2566, 2015; Harrison in J Autism and Dev Disord 47(10):3281-3295, 2017; McClain in J Autism and Dev Disord 50(3):998-1006, 2020; McMahon…
Descriptors: Measures (Individuals), Psychometrics, Test Items, Accuracy
Ntumi, Simon; Agbenyo, Sheilla; Bulala, Tapela – Shanlax International Journal of Education, 2023
There is no need or point to testing of knowledge, attributes, traits, behaviours or abilities of an individual if information obtained from the test is inaccurate. However, by and large, it seems the estimation of psychometric properties of test items in classroomshas been completely ignored otherwise dying slowly in most testing environments. In…
Descriptors: Psychometrics, Accuracy, Test Validity, Factor Analysis
Vitello, Sylvia; Crisp, Victoria; Ireland, Jo – Research Matters, 2023
Assessment materials must be checked for errors before they are presented to candidates. Any errors have the potential to reduce validity. For example, in the most extreme cases, an error may turn an otherwise well-designed exam question into one that is impossible to answer. In Cambridge University Press & Assessment, assessment materials are…
Descriptors: Check Lists, Test Validity, Error Correction, Test Construction
Apichat Khamboonruang – Language Testing in Asia, 2025
Chulalongkorn University Language Institute (CULI) test was developed as a local standardised test of English for professional and international communication. To ensure that the CULI test fulfils its intended purposes, this study employed Kane's argument-based validation and Rasch measurement approaches to construct the validity argument for the…
Descriptors: Universities, Second Language Learning, Second Language Instruction, Language Tests
Pasquale Anselmi; Jürgen Heller; Luca Stefanutti; Egidio Robusto; Giulia Barillari – Education and Information Technologies, 2025
Competence-based test development (CbTD) is a novel method for constructing tests that are as informative as possible about the competence state (the set of skills an individual masters) underlying the item responses. If desired, the tests can also be minimal, meaning that no item can be eliminated without reducing their informativeness. To…
Descriptors: Competency Based Education, Test Construction, Test Length, Usability
Yoo Jeong Jang – ProQuest LLC, 2022
Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…
Descriptors: Classification, Accuracy, Item Response Theory, Correlation
Yasuda, Jun-ichiro; Hull, Michael M.; Mae, Naohiro – Physical Review Physics Education Research, 2022
This paper presents improvements made to a computerized adaptive testing (CAT)-based version of the FCI (FCI-CAT) in regards to test security and test efficiency. First, we will discuss measures to enhance test security by controlling for item overexposure, decreasing the risk that respondents may (i) memorize the content of a pretest for use on…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Items, Risk Management
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Bokander, Lars; Bylund, Emanuel – Language Learning, 2020
Over the past decade, the LLAMA language aptitude test battery has come to play an increasingly important role as an instrument in research on individual differences in language development. However, a potentially serious problem that has been pointed out by several scholars is that the LLAMA has not yet been carefully validated. We addressed this…
Descriptors: Item Analysis, Language Tests, Test Items, Individual Differences
Rachel A. Gross – ProQuest LLC, 2020
The present study was motivated by the theory-method mismatch between heterotypic continuity (aspects of development that manifest differently across the lifespan thus cannot be measured the same way over time) and longitudinal measurement equivalence (the statistical assumption that the developmental phenomenon studied is measured on the same…
Descriptors: Robustness (Statistics), Structural Equation Models, Longitudinal Studies, Error of Measurement
Thapelo Ncube Whitfield – ProQuest LLC, 2021
Student Experience surveys are used to measure student attitudes towards their campus as well as to initiate conversations for institutional change. Validity evidence to support the interpretations of these surveys' results, however, is lacking. The first purpose of this study was to compare three Differential Item Functioning (DIF) methods on…
Descriptors: College Students, Student Surveys, Student Experience, Student Attitudes
Lukácsi, Zoltán – Language Testing, 2021
In second language writing assessment, rating scales and scores from human-mediated assessment have been criticized for a number of shortcomings including problems with adequacy, relevance, and reliability (Hamp-Lyons, 1990; McNamara, 1996; Weigle, 2002). In its testing practice, Euroexam International also detected that the rating scales for…
Descriptors: Test Construction, Test Validity, Test Items, Check Lists
Kilic, Abdullah Faruk; Dogan, Nuri – International Journal of Assessment Tools in Education, 2021
Weighted least squares (WLS), weighted least squares mean-and-variance-adjusted (WLSMV), unweighted least squares mean-and-variance-adjusted (ULSMV), maximum likelihood (ML), robust maximum likelihood (MLR) and Bayesian estimation methods were compared in mixed item response type data via Monte Carlo simulation. The percentage of polytomous items,…
Descriptors: Factor Analysis, Computation, Least Squares Statistics, Maximum Likelihood Statistics
Boori, Ali Akbar; Ghazanfari, Mohammad; Ghonsooly, Behzad; Baghaei, Purya – International Journal of Language Testing, 2023
Cognitive diagnostic models (CDMs) have received sustained attention in educational settings because they can be used to operationalize formative assessment to provide diagnostic feedback and inform instruction. A large number of CDMs have been developed over the past few years. An important component of all CDMs is a Q-matrix that specifies a…
Descriptors: Reading Comprehension, Reading Tests, English (Second Language), Islam
von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023
Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…
Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
