Publication Date
| In 2026 | 0 |
| Since 2025 | 200 |
| Since 2022 (last 5 years) | 1070 |
| Since 2017 (last 10 years) | 2580 |
| Since 2007 (last 20 years) | 4941 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…
Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators
A Hybrid Approach for Automatic Generation of Named Entity Distractors for Multiple Choice Questions
Patra, Rakesh; Saha, Sujan Kumar – Education and Information Technologies, 2019
Assessment plays an important role in learning and Multiple Choice Questions (MCQs) are quite popular in large-scale evaluations. Technology-enabled learning necessitates a smart assessment. Therefore, automatic MCQ generation became increasingly popular in the last two decades. Despite a large amount of research effort, system generated MCQs are…
Descriptors: Multiple Choice Tests, High Stakes Tests, Semantics, Evaluation Methods
Wu, Qian; De Laet, Tinne; Janssen, Rianne – Journal of Educational Measurement, 2019
Single-best answers to multiple-choice items are commonly dichotomized into correct and incorrect responses, and modeled using either a dichotomous item response theory (IRT) model or a polytomous one if differences among all response options are to be retained. The current study presents an alternative IRT-based modeling approach to…
Descriptors: Multiple Choice Tests, Item Response Theory, Test Items, Responses
Winchip, Emily; Stevenson, Howard; Milner, Alison – Educational Review, 2019
As the Global Education Reform Movement (GERM) spreads, key questions that attempt to identify both the nature and the increasing scope and scale of this phenomenon become empirically significant. The concern of this article is to highlight some of the complexities of measuring one key element of the GERM: the privatisation of public education…
Descriptors: Privatization, Foreign Countries, Item Response Theory, Probability
Care, Esther; Vista, Alvin; Kim, Helyn – UNESCO Bangkok, 2019
UNESCO's Asia-Pacific Regional Bureau for Education has been working on education quality under the name of 'transversal competencies' (TVC) since 2013. Many of these competencies have been included in national education policy and curricula of countries in the region, but now the importance accorded them is increasingly gaining attention. As…
Descriptors: Foreign Countries, Educational Quality, 21st Century Skills, Competence
Nazaretsky, Tanya; Hershkovitz, Sara; Alexandron, Giora – International Educational Data Mining Society, 2019
Sequencing items in adaptive learning systems typically relies on a large pool of interactive question items that are analyzed into a hierarchy of skills, also known as Knowledge Components (KCs). Educational data mining techniques can be used to analyze students response data in order to optimize the mapping of items to KCs, with similarity-based…
Descriptors: Intelligent Tutoring Systems, Item Response Theory, Measurement, Testing
Tom Bramley; Victoria Crisp; Stuart Shaw – Research Matters, 2019
In the traditional approach to constructing a GCSE or A Level examination paper, a single person writes the whole paper. In some other contexts, tests are constructed by selecting questions from a bank of questions. In this research, we asked experts to evaluate the quality of Physics exam papers constructed in the traditional way, constructed by…
Descriptors: Physics, Science Tests, Science Instruction, Test Construction
Sen, Sedat; Terzi, Ragip; Yildirim, Ibrahim; Cohen, Allan S. – Turkish Journal of Education, 2018
The purpose of this study was to examine the effect of equated and non-equated data on value-added assessment analyses. Several models have been proposed in the literature to apply the value-added assessment approach. This study compared two different value-added models: the unadjusted hierarchical linear model and the generalized persistence…
Descriptors: Equated Scores, Value Added Models, Hierarchical Linear Modeling, Persistence
Atalmis, Erkan Hasan; Kingston, Neal Martin – SAGE Open, 2018
This study explored the impact of homogeneity of answer choices on item difficulty and discrimination. Twenty-two matched pairs of elementary and secondary mathematics items were administered to randomly equivalent samples of students. Each item pair comparison was treated as a separate study with the set of effect sizes analyzed using…
Descriptors: Test Items, Difficulty Level, Multiple Choice Tests, Mathematics Tests
Marcoulides, Katerina M. – Measurement: Interdisciplinary Research and Perspectives, 2018
This study examined the use of Bayesian analysis methods for the estimation of item parameters in a two-parameter logistic item response theory model. Using simulated data under various design conditions with both informative and non-informative priors, the parameter recovery of Bayesian analysis methods were examined. Overall results showed that…
Descriptors: Bayesian Statistics, Item Response Theory, Probability, Difficulty Level
Sunbul, Onder; Yormaz, Seha – International Journal of Evaluation and Research in Education, 2018
In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…
Descriptors: Difficulty Level, Cheating, Duplication, Test Length
Sünbül, Seçil Ömür – International Journal of Evaluation and Research in Education, 2018
In this study, it was aimed to investigate the impact of different missing data handling methods on DINA model parameter estimation and classification accuracy. In the study, simulated data were used and the data were generated by manipulating the number of items and sample size. In the generated data, two different missing data mechanisms…
Descriptors: Data, Test Items, Sample Size, Statistical Analysis
Sunbul, Onder; Yormaz, Seha – Eurasian Journal of Educational Research, 2018
Purpose: Several studies can be found in the literature that investigate the performance of ? under various conditions. However no study for the effects of item difficulty, item discrimination, and ability restrictions on the performance of ? could be found. The current study aims to investigate the performance of ? for the conditions given below.…
Descriptors: Test Items, Difficulty Level, Ability, Cheating
Choe, Edison M.; Kern, Justin L.; Chang, Hua-Hua – Journal of Educational and Behavioral Statistics, 2018
Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response…
Descriptors: Computer Assisted Testing, Reaction Time, Item Response Theory, Test Items
Kim, Sooyeon; Lu, Ru – ETS Research Report Series, 2018
The purpose of this study was to evaluate the effectiveness of linking test scores by using test takers' background data to form pseudo-equivalent groups (PEG) of test takers. Using 4 operational test forms that each included 100 items and were taken by more than 30,000 test takers, we created 2 half-length research forms that had either 20…
Descriptors: Test Items, Item Banks, Difficulty Level, Comparative Analysis

Peer reviewed
Direct link
