Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 39 |
Descriptor
Correlation | 49 |
Evaluation Methods | 49 |
Test Items | 49 |
Scores | 16 |
Item Response Theory | 13 |
Difficulty Level | 9 |
Factor Analysis | 9 |
Item Analysis | 9 |
Test Construction | 9 |
Comparative Analysis | 8 |
Foreign Countries | 8 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Location
Canada | 2 |
United Kingdom | 2 |
United States | 2 |
Alabama | 1 |
Arizona | 1 |
Australia | 1 |
California | 1 |
China | 1 |
Germany | 1 |
Hong Kong | 1 |
India | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 2 |
Armed Services Vocational… | 1 |
Center for Epidemiologic… | 1 |
Graduate Management Admission… | 1 |
National Assessment of… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Guo, Wenjing; Choi, Youn-Jeng – Educational and Psychological Measurement, 2023
Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been…
Descriptors: Item Response Theory, Evaluation Methods, Factor Analysis, Guidelines
Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023
A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…
Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
A new index of item discrimination power (IDP), dimension-corrected Somers' D (D2) is proposed. Somers' D is one of the superior alternatives for item-total- (Rit) and item-rest correlation (Rir) in reflecting the real IDP with items with scales 0/1 and 0/1/2, that is, up to three categories. D also reaches the extreme value +1 and -1 correctly…
Descriptors: Item Analysis, Correlation, Test Items, Simulation
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022
Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…
Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies
Smith, Trevor I.; Bendjilali, Nasrine – Physical Review Physics Education Research, 2022
Several recent studies have employed item response theory (IRT) to rank incorrect responses to commonly used research-based multiple-choice assessments. These studies use Bock's nominal response model (NRM) for applying IRT to categorical (nondichotomous) data, but the response rankings only utilize half of the parameters estimated by the model.…
Descriptors: Item Response Theory, Test Items, Multiple Choice Tests, Science Tests
Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…
Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods
Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019
Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…
Descriptors: Value Added Models, Scores, Sample Size, Correlation
Malec, Wojciech; Krzeminska-Adamek, Malgorzata – Practical Assessment, Research & Evaluation, 2020
The main objective of the article is to compare several methods of evaluating multiple-choice options through classical item analysis. The methods subjected to examination include the tabulation of choice distribution, the interpretation of trace lines, the point-biserial correlation, the categorical analysis of trace lines, and the investigation…
Descriptors: Comparative Analysis, Evaluation Methods, Multiple Choice Tests, Item Analysis
Yasar, Metin – European Journal of Educational Sciences, 2017
In this study, a multiple choice test which is composed of 19 articles which is prepared as per the scope of lesson of Measurement and Evaluation in Education, has been applied as interim exam to 207 teacher candidates who are getting education at the Faculty of Education. The difficulty levels of items which are in the test have been calculated…
Descriptors: Test Items, Difficulty Level, Preservice Teachers, Teacher Education
Cox, Troy L.; Bown, Jennifer; Burdis, Jacob – Foreign Language Annals, 2015
This study investigates the effect of proficiency- vs. performance-based elicited imitation (EI) assessment. EI requires test-takers to repeat sentences in the target language. The accuracy at which test-takers are able to repeat sentences highly correlates with test-takers' language proficiency. However, in EI, the factors that render an item…
Descriptors: Language Proficiency, Imitation, Sentences, Correlation
Avsec, Stanislav; Jamšek, Janez – International Journal of Technology and Design Education, 2016
Technological literacy is identified as a vital achievement of technology- and engineering-intensive education. It guides the design of technology and technical components of educational systems and defines competitive employment in technological society. Existing methods for measuring technological literacy are incomplete or complicated,…
Descriptors: Technological Literacy, Elementary School Students, Secondary School Students, Evaluation Methods
Gaillard, Stéphanie; Tremblay, Annie – Language Learning, 2016
This study investigated the elicited imitation task (EIT) as a tool for measuring linguistic proficiency in a second/foreign (L2) language, focusing on French. Nonnative French speakers (n = 94) and native French speakers (n = 6) completed an EIT that included 50 sentences varying in length and complexity. Three raters evaluated productions on…
Descriptors: Language Proficiency, Cloze Procedure, Questionnaires, Language Tests
Thummaphan, Phonraphee – ProQuest LLC, 2017
The present study aimed to represent the innovative assessments that support students' learning in STEM education through using the integrative framework for Cognitive Diagnostic Modeling (CDM). This framework is based on three components, cognition, observation, and interpretation (National Research Council, 2001). Specifically, this dissertation…
Descriptors: STEM Education, Cognitive Processes, Observation, Psychometrics
Andjelic, Svetlana; Cekerevac, Zoran – Education and Information Technologies, 2014
This article presents the original model of the computer adaptive testing and grade formation, based on scientifically recognized theories. The base of the model is a personalized algorithm for selection of questions depending on the accuracy of the answer to the previous question. The test is divided into three basic levels of difficulty, and the…
Descriptors: Computer Assisted Testing, Educational Technology, Grades (Scholastic), Test Construction