Publication Date
In 2025 | 3 |
Since 2024 | 11 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 15 |
Since 2006 (last 20 years) | 25 |
Descriptor
Comparative Testing | 141 |
Test Reliability | 141 |
Test Validity | 63 |
Higher Education | 44 |
Test Construction | 35 |
Test Format | 26 |
Foreign Countries | 23 |
College Students | 21 |
Psychometrics | 18 |
Test Items | 18 |
Multiple Choice Tests | 17 |
More ▼ |
Source
Author
Bracken, Bruce A. | 3 |
Gallas, Edwin J. | 3 |
Smith, Douglas K. | 3 |
Anderson, Paul S. | 2 |
Breland, Hunter M. | 2 |
Green, Kathy | 2 |
Hyers, Albert D. | 2 |
Karma, Kai | 2 |
Naglieri, Jack A. | 2 |
Schroeder, David H. | 2 |
Thompson, Bruce | 2 |
More ▼ |
Publication Type
Reports - Research | 141 |
Journal Articles | 77 |
Speeches/Meeting Papers | 38 |
Tests/Questionnaires | 4 |
Books | 1 |
Collected Works - Proceedings | 1 |
Dissertations/Theses -… | 1 |
Opinion Papers | 1 |
Education Level
Higher Education | 10 |
Postsecondary Education | 8 |
Secondary Education | 4 |
Elementary Education | 2 |
Elementary Secondary Education | 2 |
Grade 4 | 2 |
High Schools | 2 |
Early Childhood Education | 1 |
Grade 10 | 1 |
Grade 2 | 1 |
Grade 7 | 1 |
More ▼ |
Audience
Researchers | 8 |
Practitioners | 2 |
Teachers | 2 |
Location
China | 4 |
United States | 4 |
Australia | 3 |
Canada | 3 |
Ireland | 2 |
Alabama | 1 |
Austria | 1 |
Finland | 1 |
France | 1 |
Georgia (Atlanta) | 1 |
Germany | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 2 |
Assessments and Surveys
What Works Clearinghouse Rating
Amanda A. Wolkowitz; Russell Smith – Practical Assessment, Research & Evaluation, 2024
A decision consistency (DC) index is an estimate of the consistency of a classification decision on an exam. More specifically, DC estimates the percentage of examinees that would have the same classification decision on an exam if they were to retake the same or a parallel form of the exam again without memory of taking the exam the first time.…
Descriptors: Testing, Test Reliability, Replication (Evaluation), Decision Making
Eren Can Aybek; Serkan Arikan; Günes Ertas – International Journal of Assessment Tools in Education, 2024
When it is required to estimate item parameters of a large item bank, Multiple Matrix Sampling (MMS) design provides an efficient way while minimizing the test burden on students. The current study exemplifies how to calibrate a large item pool using MMS design for various purposes, such as developing a CAT administration. The purpose of the…
Descriptors: Elementary School Mathematics, Elementary School Students, Grade 4, Item Banks
Vinay Kumar Yadav; Shakti Prasad – Measurement: Interdisciplinary Research and Perspectives, 2024
In sample survey analysis, accurate population mean estimation is an important task, but traditional approaches frequently ignore the intricacies of real-world data, leading to biassed results. In order to handle uncertainties, indeterminacies, and ambiguity, this work presents an innovative approach based on neutrosophic statistics. We proposed…
Descriptors: Sampling, Statistical Bias, Predictor Variables, Predictive Measurement
Tom Benton – Research Matters, 2024
Educational assessment is used throughout the world for a range of different formative and summative purposes. Wherever an assessment is developed, whether by a teacher creating a quiz for their class, or by a testing company creating a high stakes assessment, it is necessary to decide how long the test should be. Specifically, how many questions…
Descriptors: Foreign Countries, High Stakes Tests, Test Length, Test Construction
Tülin Otbiçer Acar – Measurement: Interdisciplinary Research and Perspectives, 2024
The aim of this study is to compare the results of correlation coefficient estimation of reliability with those obtained through the Bland-Altman plot technique. The scale was first divided into two halves using three different approaches. A linear and high-level relationship was found between the scale scores obtained from the halved forms.…
Descriptors: High School Students, Measurement Techniques, Psychometrics, Comparative Testing
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Catherine Mata; Katharine Meyer; Lindsay Page – Annenberg Institute for School Reform at Brown University, 2024
This article examines the risk of crossover contamination in individual-level randomization, a common concern in experimental research, in the context of a large-enrollment college course. While individual-level randomization is more efficient for assessing program effectiveness, it also increases the potential for control group students to cross…
Descriptors: Chemistry, Science Instruction, Undergraduate Students, Large Group Instruction
Casey J. Metoyer; Katherine Sullivan; Lee J. Winchester; Mark T. Richardson; Michael R. Esco; Michael V. Fedewa – Measurement in Physical Education and Exercise Science, 2025
Relative adiposity (%Fat) was measured using a smartphone-based application in a convenience sample of adults aged 20-52 years (n = 32, 68.7% female, 84.3% White/Caucasian, 26.7 ± 3.5 kg/m2) across different body positions (Anterior versus Posterior) on consecutive days (Day 1 versus Day 2). A reference photo was obtained from the posterior view…
Descriptors: Adults, Body Composition, Handheld Devices, Computer Assisted Instruction
Elizabeth B. Vaughan; Saraswathi Tummuru; Jack Barbera – Chemistry Education Research and Practice, 2025
Students' expectations for their laboratory coursework are theorized to have an impact on their learning experiences and behaviors, such as engagement. Before students' expectations and engagement can be explored in different types of undergraduate chemistry laboratory courses, appropriate measures of these constructs must be identified, and…
Descriptors: Undergraduate Students, Organic Chemistry, Chemistry, Science Instruction
Wim J. van der Linden; Luping Niu; Seung W. Choi – Journal of Educational and Behavioral Statistics, 2024
A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint…
Descriptors: Adaptive Testing, Test Construction, Test Format, Test Reliability
Ke-Hai Yuan; Zhiyong Zhang; Lijuan Wang – Grantee Submission, 2024
Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models…
Descriptors: Structural Equation Models, Path Analysis, Weighted Scores, Comparative Testing
Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018
Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…
Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests
Bijsterbosch, Erik – Geographical Education, 2018
Geography teachers' school-based (internal) examinations in pre-vocational geography education in the Netherlands appear to be in line with the findings in the literature, namely that teachers' assessment practices tend to focus on the recall of knowledge. These practices are strongly influenced by national (external) examinations. This paper…
Descriptors: Foreign Countries, Instructional Effectiveness, National Competency Tests, Geography Instruction
Ward, Samantha L.; Sullivan, Karen A.; Gilmore, Linda – Educational and Developmental Psychologist, 2016
Objective: Limited time and resources necessitate the availability of accurate, inexpensive and rapid diagnostic aids for Autism Spectrum Disorder (ASD). The Autistic Behavioural Indicators Instrument (ABII) was developed for this purpose, but its psychometric properties have not yet been fully established. Method: The clinician-rated ABII, the…
Descriptors: Autism, Pervasive Developmental Disorders, Psychometrics, Diagnostic Tests
Murray, Keith B.; Zdravkovic, Srdan – Journal of Education for Business, 2016
Considerable debate continues regarding the efficacy of the website RateMyProfessors.com (RMP). To date, however, virtually no direct, experimental research has been reported which directly bears on questions relating to sampling adequacy or item adequacy in producing what favorable correlations have been reported. The authors compare the data…
Descriptors: Computer Assisted Testing, Computer Software Evaluation, Student Evaluation of Teacher Performance, Item Analysis