Publication Date
In 2025 | 3 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 10 |
Descriptor
Test Items | 36 |
Test Validity | 36 |
Test Construction | 16 |
Test Reliability | 14 |
Achievement Tests | 8 |
Psychometrics | 8 |
Elementary Secondary Education | 7 |
Item Analysis | 7 |
Literature Reviews | 7 |
Scores | 7 |
Test Bias | 6 |
More ▼ |
Source
Author
Diamond, Esther E. | 2 |
Aiken, Lewis R. | 1 |
Barry, Margot | 1 |
Ben-Porath, Yossef S. | 1 |
Benson, Jeri | 1 |
Bin Tan | 1 |
Bracken, Bruce A. | 1 |
Buser, Karen | 1 |
Butler, Des | 1 |
Camilla M. McMahon | 1 |
Collado, Silvia | 1 |
More ▼ |
Publication Type
Information Analyses | 36 |
Journal Articles | 21 |
Speeches/Meeting Papers | 9 |
Reports - Research | 8 |
Opinion Papers | 5 |
Reports - Evaluative | 3 |
Guides - Non-Classroom | 1 |
Reports - Descriptive | 1 |
Education Level
Adult Education | 1 |
Higher Education | 1 |
Audience
Researchers | 6 |
Practitioners | 3 |
Teachers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
General Educational… | 1 |
Graduate Record Examinations | 1 |
Minnesota Multiphasic… | 1 |
What Works Clearinghouse Rating
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Camilla M. McMahon; Maryellen Brunson McClain; Savannah Wells; Sophia Thompson; Jeffrey D. Shahidullah – Journal of Autism and Developmental Disorders, 2025
Purpose: The goal of the current study was to conduct a substantive validity review of four autism knowledge assessments with prior psychometric support (Gillespie-Lynch in J Autism and Dev Disord 45(8):2553-2566, 2015; Harrison in J Autism and Dev Disord 47(10):3281-3295, 2017; McClain in J Autism and Dev Disord 50(3):998-1006, 2020; McMahon…
Descriptors: Measures (Individuals), Psychometrics, Test Items, Accuracy
Xueliang Chen; Vahid Aryadoust; Wenxin Zhang – Language Testing, 2025
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic…
Descriptors: Second Language Learning, Language Tests, Test Items, Item Analysis
Ella Anghel; Lale Khorramdel; Matthias von Davier – Large-scale Assessments in Education, 2024
As the use of process data in large-scale educational assessments is becoming more common, it is clear that data on examinees' test-taking behaviors can illuminate their performance, and can have crucial ramifications concerning assessments' validity. A thorough review of the literature in the field may inform researchers and practitioners of…
Descriptors: Educational Assessment, Test Validity, Test Items, Reaction Time
Ekaterina Sudina – Studies in Second Language Acquisition, 2023
As survey research in second language acquisition grows in popularity, the adherence to best practices associated with questionnaire quality is critical for a better understanding of factors that influence second language (L2) development. To ensure that a self-report scale targets the construct of interest and does it consistently and accurately,…
Descriptors: Second Language Learning, Language Acquisition, Measures (Individuals), Test Reliability
Rosa, Claudio D.; Collado, Silvia; Larson, Lincoln R. – Journal of Environmental Education, 2022
The New Ecological Paradigm (NEP) scale adapted for use with children (NEP-C) is one of the most frequently used measures of children's environmental beliefs. Though widely utilized, the limitations of the NEP-C instrument are often overlooked. Based on a systematic synthesis of existing literature examining the NEP-C, we argue that the scale…
Descriptors: Attitude Measures, Children, Environment, Beliefs
Villarreal, Victor – Journal of Psychoeducational Assessment, 2019
The "Rating Scale of Impairment" (RSI; Goldstein & Naglieri, 2016b) is a norm-referenced measure of functional impairment. The RSI measures impairment in six domains, as well as overall impairment, based in part on the International Classification of Functioning, Disability, and Health. Functional impairment, as defined by the ICF…
Descriptors: Rating Scales, Norm Referenced Tests, Disabilities, Test Construction
Barry, Margot; Egan, Arlene – International Review of Education, 2018
Adult learners are attracted to learning opportunities (e.g. course offers) which seem promising in terms of allowing them to match their choices to their own perceived predispositions. To find out more about their personal learning style, some adult learners may fill in a questionnaire designed by researchers who aim (and claim) to enable both…
Descriptors: Adult Learning, Cognitive Style, Adult Education, Interest Research
Walsh, Kerryann; Rassafiani, Mehdi; Mathews, Ben; Farrell, Ann; Butler, Des – Journal of Child Sexual Abuse, 2010
This paper details a systematic literature review identifying problems in extant research relating to teachers' attitudes toward reporting child sexual abuse and offers a model for new attitude scale development and testing. Scale development comprised a five-phase process grounded in contemporary attitude theories, including (a) developing the…
Descriptors: Sexual Abuse, Child Abuse, Focus Groups, Content Validity
Forbey, Johnathan D.; Ben-Porath, Yossef S. – Psychological Assessment, 2007
Computerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. Two approaches have been explored for adaptive testing in computerized personality assessment: item response theory and the countdown method. In this article, the authors…
Descriptors: Personality Traits, Computer Assisted Testing, Test Validity, Personality Assessment
Diamond, Esther E. – 1981
As test standards and research literature in general indicate, definitions of test bias and item bias vary considerably, as do the results of existing methods of identifying biased items. The situation is further complicated by issues of content, context, construct, and criterion. In achievement tests, for example, content validity may impose…
Descriptors: Achievement Tests, Aptitude Tests, Psychometrics, Test Bias
Shepard, Lorrie A. – New Directions for Testing and Measurement, 1981
The test-item bias literature is summarized, emphasizing the conceptual basis for bias detection methods and the technical issues involved in choosing among methods. It describes both judgmental and statistical methods for identifying biased items, and discusses the reconciliation of these two types of evidence. (Author/BW)
Descriptors: Evaluation Methods, Latent Trait Theory, Statistical Analysis, Test Bias
Whitney, Douglas R.; And Others – 1985
This research brief summarizes the available reliability and validity data available in, but spread throughout, a number of General Educational Development (GED) Testing Service publications. A section on reliability discusses how to determine reliability of a test's scores and two ways of assessing the reliability of a test--internal consistency…
Descriptors: Adult Education, High School Equivalency Programs, Item Analysis, Scores
Mead, Ronald J. – 1981
The central idea in building and maintaining an item bank is to calibrate all the items onto a "common variable." The arithmetic involved in the calibration process is presented. It is recommended that an analysis of fit be done in every application to verify that the estimates of item difficulties are in fact sample-free. These…
Descriptors: Equated Scores, Goodness of Fit, Item Banks, Latent Trait Theory

Aiken, Lewis R. – Journal of Research and Development in Education, 1987
A critical review is presented of research conducted during the past 20 years on multiple-choice tests of achievement and aptitude. The design and use of multiple-choice tests is emphasized, but information concerning the socioeducational implications of relying on such tests is also included. (Author/CB)
Descriptors: Academic Achievement, Academic Aptitude, Educational Sociology, Multiple Choice Tests