Publication Date
In 2025 | 0 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 22 |
Descriptor
Comparative Testing | 66 |
Item Response Theory | 66 |
Test Items | 26 |
Computer Assisted Testing | 17 |
Higher Education | 14 |
Mathematical Models | 13 |
Test Construction | 13 |
Test Format | 13 |
Item Bias | 12 |
Adaptive Testing | 11 |
Test Validity | 11 |
More ▼ |
Source
Author
Lunz, Mary E. | 3 |
Sykes, Robert C. | 3 |
Wise, Steven L. | 3 |
Clauser, Brian E. | 2 |
De Ayala, R. J. | 2 |
Ellis, Barbara B. | 2 |
Lissitz, Robert W. | 2 |
Yamamoto, Kentaro | 2 |
Agus Santoso | 1 |
Almond, Russell G. | 1 |
Ang, Cheng | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 7 |
Elementary Education | 3 |
Middle Schools | 3 |
Elementary Secondary Education | 2 |
Grade 4 | 2 |
Grade 7 | 2 |
High Schools | 2 |
Postsecondary Education | 2 |
Early Childhood Education | 1 |
Grade 10 | 1 |
Grade 3 | 1 |
More ▼ |
Audience
Location
United States | 4 |
Australia | 2 |
Germany | 2 |
California | 1 |
Canada | 1 |
China | 1 |
France | 1 |
Indonesia | 1 |
New Zealand | 1 |
Taiwan (Taipei) | 1 |
United Kingdom | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Jiayi Deng – ProQuest LLC, 2024
Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…
Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement
Peter F. Halpin – Society for Research on Educational Effectiveness, 2024
Background: Meta-analyses of educational interventions have consistently documented the importance of methodological factors related to the choice of outcome measures. In particular, when interventions are evaluated using measures developed by researchers involved with the intervention or its evaluation, the effect sizes tend to be larger than…
Descriptors: College Students, College Faculty, STEM Education, Item Response Theory
Wim J. van der Linden; Luping Niu; Seung W. Choi – Journal of Educational and Behavioral Statistics, 2024
A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint…
Descriptors: Adaptive Testing, Test Construction, Test Format, Test Reliability
Agus Santoso; Heri Retnawati; Timbul Pardede; Ibnu Rafi; Munaya Nikma Rosyada; Gulzhaina K. Kassymova; Xu Wenxin – Practical Assessment, Research & Evaluation, 2024
The test blueprint is important in test development, where it guides the test item writer in creating test items according to the desired objectives and specifications or characteristics (so-called a priori item characteristics), such as the level of item difficulty in the category and the distribution of items based on their difficulty level.…
Descriptors: Foreign Countries, Undergraduate Students, Business English, Test Construction
Xuelan Qiu; Jimmy de la Torre; You-Gan Wang; Jinran Wu – Educational Measurement: Issues and Practice, 2024
Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed,…
Descriptors: Item Response Theory, Personality Traits, Personality Measures, Personality Assessment
Megan Kuhfeld; James Soland – Annenberg Institute for School Reform at Brown University, 2020
A huge portion of what we know about how humans develop, learn, behave, and interact is based on survey data. Researchers use longitudinal growth modeling to understand the development of students on psychological and social-emotional learning constructs across elementary and middle school. In these designs, students are typically administered a…
Descriptors: Elementary School Students, Middle School Students, Social Emotional Learning, Measurement Techniques
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis
Wang, Jianjun – School Science and Mathematics, 2011
As the largest international study ever taken in history, the Trend in Mathematics and Science Study (TIMSS) has been held as a benchmark to measure U.S. student performance in the global context. In-depth analyses of the TIMSS project are conducted in this study to examine key issues of the comparative investigation: (1) item flaws in mathematics…
Descriptors: Test Items, Figurative Language, Item Response Theory, Benchmarking
Schulz, Wolfram; Fraillon, Julian – Educational Research and Evaluation, 2011
When comparing data derived from tests or questionnaires in cross-national studies, researchers commonly assume measurement invariance in their underlying scaling models. However, different cultural contexts, languages, and curricula can have powerful effects on how students respond in different countries. This article illustrates how the…
Descriptors: Citizenship Education, International Studies, Item Response Theory, International Education
Wendt, Heike; Bos, Wilfried; Goy, Martin – Educational Research and Evaluation, 2011
Several current international comparative large-scale assessments of educational achievement (ICLSA) make use of "Rasch models", to address functions essential for valid cross-cultural comparisons. From a historical perspective, ICLSA and Georg Rasch's "models for measurement" emerged at about the same time, half a century ago. However, the…
Descriptors: Measures (Individuals), Test Theory, Group Testing, Educational Testing
Monahan, Patrick O.; Lee, Won-Chan; Ankenmann, Robert D. – Journal of Educational Measurement, 2007
A Monte Carlo simulation technique for generating dichotomous item scores is presented that implements (a) a psychometric model with different explicit assumptions than traditional parametric item response theory (IRT) models, and (b) item characteristic curves without restrictive assumptions concerning mathematical form. The four-parameter beta…
Descriptors: True Scores, Psychometrics, Monte Carlo Methods, Correlation
Liow, Jong-Leng – European Journal of Engineering Education, 2008
Peer assessment has been studied in various situations and actively pursued as a means by which students are given more control over their learning and assessment achievement. This study investigated the reliability of staff and student assessments in two oral presentations with limited feedback for a school-based thesis course in engineering…
Descriptors: Feedback (Response), Student Evaluation, Grade Point Average, Peer Evaluation
Crisp, Victoria – Research Papers in Education, 2008
This research set out to compare the quality, length and nature of (1) exam responses in combined question and answer booklets, with (2) responses in separate answer booklets in order to inform choices about response format. Combined booklets are thought to support candidates by giving more information on what is expected of them. Anecdotal…
Descriptors: Geography Instruction, High School Students, Test Format, Test Construction
Wildy, Helen; Styles, Irene – Australian Journal of Early Childhood, 2008
This paper reports analysis of 2006-2007 on-entry assessment data from the Performance Indicators in Primary Schools Baseline Assessment (PIPS-BLA) of random samples of students in England, Scotland, New Zealand and Australia. The analysis aimed, first, to investigate the validity and reliability of that instrument across countries and sexes, and,…
Descriptors: National Competency Tests, Foreign Countries, Student Evaluation, Comparative Education
Walt, Nancy; Atwood, Kristin; Mann, Alex – Journal of Technology, Learning, and Assessment, 2008
The purpose of this study was to determine whether or not survey medium (electronic versus paper format) has a significant effect on the results achieved. To compare survey media, responses from elementary students to British Columbia's Satisfaction Survey were analyzed. Although this study was not experimental in design, the data set served as a…
Descriptors: Student Attitudes, Factor Analysis, Foreign Countries, Elementary School Students