Publication Date
In 2025 | 1 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 19 |
Since 2016 (last 10 years) | 40 |
Since 2006 (last 20 years) | 69 |
Descriptor
Foreign Countries | 103 |
Scoring | 103 |
Test Items | 103 |
Test Construction | 27 |
Comparative Analysis | 22 |
Item Analysis | 21 |
Item Response Theory | 20 |
Achievement Tests | 19 |
Language Tests | 19 |
Scores | 17 |
Computer Assisted Testing | 16 |
More ▼ |
Source
Author
Donovan, Jenny | 3 |
Ellington, Henry | 3 |
Lennon, Melissa | 3 |
Hutton, Penny | 2 |
Morrissey, Noni | 2 |
Nadas, Rita | 2 |
O'Connor, Gayl | 2 |
Suto, Irenka | 2 |
Xin, Tao | 2 |
Yamamoto, Kentaro | 2 |
von Davier, Matthias | 2 |
More ▼ |
Publication Type
Education Level
Secondary Education | 18 |
Higher Education | 15 |
Elementary Education | 12 |
Elementary Secondary Education | 10 |
Postsecondary Education | 10 |
Grade 6 | 5 |
Grade 8 | 4 |
Junior High Schools | 4 |
Middle Schools | 4 |
Grade 4 | 3 |
High Schools | 3 |
More ▼ |
Audience
Teachers | 11 |
Practitioners | 10 |
Administrators | 4 |
Students | 2 |
Location
Canada | 14 |
China | 12 |
Australia | 10 |
United Kingdom | 9 |
Japan | 6 |
United States | 6 |
Turkey | 5 |
Netherlands | 4 |
Denmark | 3 |
France | 3 |
Germany | 3 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Deschênes, Marie-France; Dionne, Éric; Dorion, Michelle; Grondin, Julie – Practical Assessment, Research & Evaluation, 2023
The use of the aggregate scoring method for scoring concordance tests requires the weighting of test items to be derived from the performance of a group of experts who take the test under the same conditions as the examinees. However, the average score of experts constituting the reference panel remains a critical issue in the use of these tests.…
Descriptors: Scoring, Tests, Evaluation Methods, Test Items
Pearson, Christopher; Penna, Nigel – Assessment & Evaluation in Higher Education, 2023
E-assessments are becoming increasingly common and progressively more complex. Consequently, how these longer, more complex questions are designed and marked is imperative. This article uses the NUMBAS e-assessment tool to investigate the best practice for creating longer questions and their mark schemes on surveying modules taken by engineering…
Descriptors: Automation, Scoring, Engineering Education, Foreign Countries
Harrison, Scott; Kroehne, Ulf; Goldhammer, Frank; Lüdtke, Oliver; Robitzsch, Alexander – Large-scale Assessments in Education, 2023
Background: Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an…
Descriptors: Scoring, Test Items, Difficulty Level, Foreign Countries
Gustafsson, Martin; Barakat, Bilal Fouad – Comparative Education Review, 2023
International assessments inform education policy debates, yet little is known about their floor effects: To what extent do they fail to differentiate between the lowest performers, and what are the implications of this? TIMSS, SACMEQ, and LLECE data are analyzed to answer this question. In TIMSS, floor effects have been reduced through the…
Descriptors: Achievement Tests, Elementary Secondary Education, International Assessment, Foreign Countries
Qiwei He – International Journal of Assessment Tools in Education, 2023
Collaborative problem solving (CPS) is inherently an interactive, conjoint, dual-strand process that considers how a student reasons about a problem as well as how s/he interacts with others to regulate social processes and exchange information (OECD, 2013). Measuring CPS skills presents a challenge for obtaining consistent, accurate, and reliable…
Descriptors: Cooperative Learning, Problem Solving, Test Items, International Assessment
Emma Walland – Research Matters, 2024
GCSE examinations (taken by students aged 16 years in England) are not intended to be speeded (i.e. to be partly a test of how quickly students can answer questions). However, there has been little research exploring this. The aim of this research was to explore the speededness of past GCSE written examinations, using only the data from scored…
Descriptors: Educational Change, Test Items, Item Analysis, Scoring
Kunal Sareen – Innovations in Education and Teaching International, 2024
This study examines the proficiency of Chat GPT, an AI language model, in answering questions on the Situational Judgement Test (SJT), a widely used assessment tool for evaluating the fundamental competencies of medical graduates in the UK. A total of 252 SJT questions from the "Oxford Assess and Progress: Situational Judgement" Test…
Descriptors: Ethics, Decision Making, Artificial Intelligence, Computer Software
Gao, Xuliang; Ma, Wenchao; Wang, Daxun; Cai, Yan; Tu, Dongbo – Journal of Educational and Behavioral Statistics, 2021
This article proposes a class of cognitive diagnosis models (CDMs) for polytomously scored items with different link functions. Many existing polytomous CDMs can be considered as special cases of the proposed class of polytomous CDMs. Simulation studies were carried out to investigate the feasibility of the proposed CDMs and the performance of…
Descriptors: Cognitive Measurement, Models, Test Items, Scoring
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Almehrizi, Rashid S. – Applied Measurement in Education, 2021
KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…
Descriptors: Test Reliability, Scores, Scoring, Computation
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Lai, Kelly Y. C.; Yuen, Emily C. W.; Hung, Se Fong; Leung, Patrick W. L. – Journal of Autism and Developmental Disorders, 2022
This study examines the psychometric properties of the Autism Diagnostic Interview-Revised (ADI-R) in the context of DSM-5 in a sample of Chinese children. Using re-mapped ADI-R items and algorithms matched to DSM-5 criteria, and administering to children with autism spectrum disorder (ASD) with and without intellectual disability,…
Descriptors: Autism, Pervasive Developmental Disorders, Diagnostic Tests, Observation
Dirlik, Ezgi Mor – International Journal of Progressive Education, 2020
Mokken models have recently started to become the preferred method of researchers from different fields in studies of nonparametric item response theory (NIRT). Despite increasing application of these models, some features of this type of modelling need further study and explanation. Invariant item ordering (IIO) is one of these areas, which the…
Descriptors: Item Response Theory, Test Items, Nonparametric Statistics, Scoring
Care, Esther; Vista, Alvin; Kim, Helyn – UNESCO Bangkok, 2019
UNESCO's Asia-Pacific Regional Bureau for Education has been working on education quality under the name of 'transversal competencies' (TVC) since 2013. Many of these competencies have been included in national education policy and curricula of countries in the region, but now the importance accorded them is increasingly gaining attention. As…
Descriptors: Foreign Countries, Educational Quality, 21st Century Skills, Competence
Zhou, Shao-Na; Liu, Qiao-Yi; Koenig, Kathleen; Xiao, Qiu-ye Li-Yang; Bao, Lei – Journal of Baltic Science Education, 2021
The Lawson's Classroom Test of Scientific Reasoning (LCTSR) is a popular instrument that measures the development of students' scientific reasoning skills. The instrument has a two-tier question design, which has led to multiple ways of scoring and interpretation. In this research, a method of pattern analysis was proposed and applied to analyze…
Descriptors: Science Tests, Science Process Skills, Logical Thinking, Multiple Choice Tests