Publication Date
In 2025 | 0 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 16 |
Since 2016 (last 10 years) | 42 |
Since 2006 (last 20 years) | 92 |
Descriptor
Models | 158 |
Test Construction | 158 |
Test Items | 158 |
Test Validity | 46 |
Item Response Theory | 44 |
Difficulty Level | 34 |
Foreign Countries | 30 |
Item Analysis | 28 |
Psychometrics | 28 |
Test Reliability | 25 |
Computer Assisted Testing | 23 |
More ▼ |
Source
Author
Gierl, Mark J. | 5 |
Bejar, Isaac I. | 4 |
Huff, Kristen | 4 |
Lai, Hollis | 4 |
Graf, Edith Aurora | 3 |
Hendrickson, Amy | 3 |
Berger, Martijn P. F. | 2 |
Champagne, Zachary M. | 2 |
Farina, Kristy | 2 |
Futagi, Yoko | 2 |
Hambleton, Ronald K. | 2 |
More ▼ |
Publication Type
Education Level
Location
Canada | 5 |
Germany | 4 |
Indonesia | 3 |
United Kingdom | 3 |
Australia | 2 |
Georgia | 2 |
United Kingdom (England) | 2 |
Belgium | 1 |
California | 1 |
China | 1 |
Colombia | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Becker, Benjamin; Weirich, Sebastian; Goldhammer, Frank; Debeer, Dries – Journal of Educational Measurement, 2023
When designing or modifying a test, an important challenge is controlling its speededness. To achieve this, van der Linden (2011a, 2011b) proposed using a lognormal response time model, more specifically the two-parameter lognormal model, and automated test assembly (ATA) via mixed integer linear programming. However, this approach has a severe…
Descriptors: Test Construction, Automation, Models, Test Items
A Method for Generating Course Test Questions Based on Natural Language Processing and Deep Learning
Hei-Chia Wang; Yu-Hung Chiang; I-Fan Chen – Education and Information Technologies, 2024
Assessment is viewed as an important means to understand learners' performance in the learning process. A good assessment method is based on high-quality examination questions. However, generating high-quality examination questions manually by teachers is a time-consuming task, and it is not easy for students to obtain question banks. To solve…
Descriptors: Natural Language Processing, Test Construction, Test Items, Models
Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024
Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…
Descriptors: Test Items, Test Construction, Difficulty Level, Prediction
Güler Yavuz Temel – Journal of Educational Measurement, 2024
The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in…
Descriptors: Computation, Multidimensional Scaling, Item Response Theory, Models
Aditya Shah; Ajay Devmane; Mehul Ranka; Prathamesh Churi – Education and Information Technologies, 2024
Online learning has grown due to the advancement of technology and flexibility. Online examinations measure students' knowledge and skills. Traditional question papers include inconsistent difficulty levels, arbitrary question allocations, and poor grading. The suggested model calibrates question paper difficulty based on student performance to…
Descriptors: Computer Assisted Testing, Difficulty Level, Grading, Test Construction
Olney, Andrew M. – Grantee Submission, 2022
Multi-angle question answering models have recently been proposed that promise to perform related tasks like question generation. However, performance on related tasks has not been thoroughly studied. We investigate a leading model called Macaw on the task of multiple choice question generation and evaluate its performance on three angles that…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Models
Luan, Lin; Liang, Jyh-Chong; Chai, Ching Sing; Lin, Tzu-Bin; Dong, Yan – Interactive Learning Environments, 2023
The emergence of new media technologies has empowered individuals to not merely consume but also create, share and critique media contents. Such activities are dependent on new media literacy (NML) necessary for living and working in the participatory culture of the twenty-first century. Although a burgeoning body of research has focused on the…
Descriptors: Foreign Countries, Media Literacy, Test Construction, English (Second Language)
Andrew M. Olney – Grantee Submission, 2023
Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Algorithms
Abdullah Abdul Wahab Alsayar – ProQuest LLC, 2021
Testlets bring several perks in the development and administration of tests, such as 1) the construction of meaningful test items, 2) the avoidance of non-relevant context exposure, 3) the improvement of testing efficiency, and 4) the progression of testlet items requiring higher thinking skills. Thus, the inclusion of testlets in educational…
Descriptors: Test Construction, Testing, Test Items, Efficiency
Alpizar, David; Li, Tongyun; Norris, John M.; Gu, Lixiong – Language Testing, 2023
The C-test is a type of gap-filling test designed to efficiently measure second language proficiency. The typical C-test consists of several short paragraphs with the second half of every second word deleted. The words with deleted parts are considered as items nested within the corresponding paragraph. Given this testlet structure, it is commonly…
Descriptors: Psychometrics, Language Tests, Second Language Learning, Test Items
Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023
Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…
Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models
Ketabi, Somaye; Alavi, Seyyed Mohammed; Ravand, Hamdollah – International Journal of Language Testing, 2021
Although Diagnostic Classification Models (DCMs) were introduced to education system decades ago, it seems that these models were not employed for the original aims upon which they had been designed. Using DCMs has been mostly common in analyzing large-scale non-diagnostic tests and these models have been rarely used in developing Cognitive…
Descriptors: Diagnostic Tests, Test Construction, Goodness of Fit, Classification
Mateja Ploj Virtic; Andre Du Plessis; Andrej Šorgo – Center for Educational Policy Studies Journal, 2023
In the context of improving the quality of teacher education, the focus of the present work was to adapt the Mentoring for Effective Primary Science Teaching instrument to become more universal and have the potential to be used beyond the elementary science mentoring context. The adapted instrument was renamed the Mentoring for Effective Teaching…
Descriptors: Test Construction, Test Validity, Test Reliability, Measures (Individuals)
Rao, Dhawaleswar; Saha, Sujan Kumar – IEEE Transactions on Learning Technologies, 2020
Automatic multiple choice question (MCQ) generation from a text is a popular research area. MCQs are widely accepted for large-scale assessment in various domains and applications. However, manual generation of MCQs is expensive and time-consuming. Therefore, researchers have been attracted toward automatic MCQ generation since the late 90's.…
Descriptors: Multiple Choice Tests, Test Construction, Automation, Computer Software
Liu, Ren – Educational and Psychological Measurement, 2018
Attribute structure is an explicit way of presenting the relationship between attributes in diagnostic measurement. The specification of attribute structures directly affects the classification accuracy resulted from psychometric modeling. This study provides a conceptual framework for understanding misspecifications of attribute structures. Under…
Descriptors: Diagnostic Tests, Classification, Test Construction, Relationship