Publication Date
In 2025 | 154 |
Since 2024 | 586 |
Since 2021 (last 5 years) | 2094 |
Descriptor
Source
Author
Kuhfeld, Megan | 14 |
Megan Kuhfeld | 14 |
Jerrim, John | 10 |
Matthias von Davier | 8 |
Robitzsch, Alexander | 8 |
Borgonovi, Francesca | 7 |
John Jerrim | 7 |
Kajsa Yang Hansen | 7 |
Lewis, Karyn | 7 |
Lüdtke, Oliver | 7 |
Braeken, Johan | 6 |
More ▼ |
Publication Type
Education Level
Audience
Policymakers | 13 |
Teachers | 12 |
Researchers | 9 |
Administrators | 8 |
Practitioners | 2 |
Community | 1 |
Parents | 1 |
Students | 1 |
Location
Turkey | 145 |
Texas | 143 |
Australia | 96 |
United States | 72 |
South Korea | 71 |
Singapore | 66 |
China | 64 |
Finland | 59 |
Hong Kong | 52 |
Sweden | 52 |
Canada | 51 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 4 |
Does not meet standards | 4 |
Marta Siedlecka; Piotr Litwin; Paulina Szyszka; Boryslaw Paulewicz – European Journal of Psychology of Education, 2025
Students change their responses during tests, and these revisions are often correct. Some studies have suggested that decisions regarding revisions are informed by metacognitive monitoring. We investigated whether assessing and reporting response confidence increases the accuracy of revisions and the final test score, and whether confidence in a…
Descriptors: Student Evaluation, Decision Making, Responses, Achievement Tests
Anne Traynor; Sara C. Christopherson – Applied Measurement in Education, 2024
Combining methods from earlier content validity and more contemporary content alignment studies may allow a more complete evaluation of the meaning of test scores than if either set of methods is used on its own. This article distinguishes item relevance indices in the content validity literature from test representativeness indices in the…
Descriptors: Test Validity, Test Items, Achievement Tests, Test Construction
Jeanne Sinclair – Critical Inquiry in Language Studies, 2025
In this paper, the White listening subject takes the form of a standardized high-stakes reading test, the State of Texas Assessment of Academic Readiness (STAAR). Although the test does not actually listen, it 'hears' and evaluates children's responses to its questions. I present the results of the 2017 Grade 8 reading exams, from the March, May,…
Descriptors: High Stakes Tests, Standardized Tests, Reading Tests, Achievement Tests
Anne H. Davidson – National Assessment Governing Board, 2025
The purpose of this National Assessment of Educational Progress (NAEP) Achievement Levels Validity Argument Report is to synthesize evidence currently available to address the validity of the interpretations and uses of the NAEP Achievement Levels. Validity is the extent to which theory and evidence supports or refutes proposed and enacted test…
Descriptors: National Competency Tests, Academic Achievement, Test Validity, College Entrance Examinations
Hasibe Yahsi Sari; Hulya Kelecioglu – International Journal of Assessment Tools in Education, 2025
The aim of the study is to examine the effect of polytomous item ratio on ability estimation in different conditions in multistage tests (MST) using mixed tests. The study is simulation-based research. In the PISA 2018 application, the ability parameters of the individuals and the item pool were created by using the item parameters estimated from…
Descriptors: Test Items, Test Format, Accuracy, Test Length
B. Goecke; S. Weiss; B. Barbot – Journal of Creative Behavior, 2025
The present paper questions the content validity of the eight creativity-related self-report scales available in PISA 2022's context questionnaire and provides a set of considerations for researchers interested in using these indexes. Specifically, we point out some threats to the content validity of these scales (e.g., "creative thinking…
Descriptors: Creativity, Creativity Tests, Questionnaires, Content Validity
Chunyan Shi – SAGE Open, 2025
The National Matriculation English Test (NMET), also known as the Gaokao English Examination, is a high-stakes, large-scale selection test for tertiary education in China, with numerous provinces and regions adopting the standardized national test papers. The validity of the NMET has garnered extensive attention. To verify the NMET's validity, it…
Descriptors: Foreign Countries, English (Second Language), Language Tests, High Stakes Tests
Zeynep Uzun; Tuncay Ögretmen – Large-scale Assessments in Education, 2025
This study aimed to evaluate the item model fit by equating the forms of the PISA 2018 mathematics subtest with concurrent common items equating in samples from Türkiye, the UK, and Italy. The answers given in mathematics subtest Forms 2, 8, and 12 were used in this context. Analyzes were performed using the Dichotomous Rasch Model in the WINSTEPS…
Descriptors: Item Response Theory, Test Items, Foreign Countries, Mathematics Tests
David Meechan; Zeta Williams-Brown; Tracy Whatmore; Simon Halfhead – Education 3-13, 2024
The paper focuses on findings from research that investigated teachers' and key stakeholders' perspectives on the use of Reception Baseline Assessment. Data collection was carried out in 2021-2022, which was the year this assessment was introduced into Reception classes in England. In total, 70 teachers and key stakeholders from 47 Local…
Descriptors: Foreign Countries, Preschool Education, Preschool Teachers, Achievement Tests
Selcuk Acar; Yuyang Shen – Journal of Creative Behavior, 2025
Creativity tests, like creativity itself, vary widely in their structure and use. These differences include instructions, test duration, environments, prompt and response modalities, and the structure of test items. A key factor is task structure, referring to the specificity of the number of responses requested for a given prompt. Classic…
Descriptors: Creativity, Creative Thinking, Creativity Tests, Task Analysis
Lawrence T. DeCarlo – Educational and Psychological Measurement, 2024
A psychological framework for different types of items commonly used with mixed-format exams is proposed. A choice model based on signal detection theory (SDT) is used for multiple-choice (MC) items, whereas an item response theory (IRT) model is used for open-ended (OE) items. The SDT and IRT models are shown to share a common conceptualization…
Descriptors: Test Format, Multiple Choice Tests, Item Response Theory, Models
Mahmood Ul Hassan; Frank Miller – Journal of Educational Measurement, 2024
Multidimensional achievement tests are recently gaining more importance in educational and psychological measurements. For example, multidimensional diagnostic tests can help students to determine which particular domain of knowledge they need to improve for better performance. To estimate the characteristics of candidate items (calibration) for…
Descriptors: Multidimensional Scaling, Achievement Tests, Test Items, Test Construction
Ebru Dogruöz; Hülya Kelecioglu – International Journal of Assessment Tools in Education, 2024
In this research, multistage adaptive tests (MST) were compared according to sample size, panel pattern and module length for top-down and bottom-up test assembly methods. Within the scope of the research, data from PISA 2015 were used and simulation studies were conducted according to the parameters estimated from these data. Analysis results for…
Descriptors: Adaptive Testing, Test Construction, Foreign Countries, Achievement Tests
Emma Walland – Research Matters, 2024
GCSE examinations (taken by students aged 16 years in England) are not intended to be speeded (i.e. to be partly a test of how quickly students can answer questions). However, there has been little research exploring this. The aim of this research was to explore the speededness of past GCSE written examinations, using only the data from scored…
Descriptors: Educational Change, Test Items, Item Analysis, Scoring
Liqun Yin; Ummugul Bezirhan; Matthias von Davier – International Electronic Journal of Elementary Education, 2025
This paper introduces an approach that uses latent class analysis to identify cut scores (LCA-CS) and categorize respondents based on context scales derived from largescale assessments like PIRLS, TIMSS, and NAEP. Context scales use Likert scale items to measure latent constructs of interest and classify respondents into meaningful ordered…
Descriptors: Multivariate Analysis, Cutting Scores, Achievement Tests, Foreign Countries