Publication Date
| In 2026 | 0 |
| Since 2025 | 220 |
| Since 2022 (last 5 years) | 1089 |
| Since 2017 (last 10 years) | 2599 |
| Since 2007 (last 20 years) | 4960 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Peer reviewedChang, Lei; And Others – Applied Measurement in Education, 1996
The influence of judges' knowledge on standard setting for competency tests was studied with 17 judges who took an economics teacher certification test while setting competency standards using the Angoff procedure. Judges tended to set higher standards for items they answered correctly and lower standards for items they answered incorrectly. (SLD)
Descriptors: Competence, Difficulty Level, Economics, Judges
Finch, Holmes; Habing, Brian – Journal of Educational Measurement, 2005
This study examines the performance of a new method for assessing and characterizing dimensionality in test data using the NOHARM model, and comparing it with DETECT. Dimensionality assessment is carried out using two goodness-of-fit statistics that are compared to reference X[2] distributions. A Monte Carlo study is used with item parameters…
Descriptors: Program Effectiveness, Monte Carlo Methods, Item Response Theory, Comparative Analysis
Graham, James M. – Educational and Psychological Measurement, 2006
Coefficient alpha, the most commonly used estimate of internal consistency, is often considered a lower bound estimate of reliability, though the extent of its underestimation is not typically known. Many researchers are unaware that coefficient alpha is based on the essentially tau-equivalent measurement model. It is the violation of the…
Descriptors: Models, Test Theory, Reliability, Structural Equation Models
Sotaridona, Leonardo S.; van der Linden, Wim J.; Meijer, Rob R. – Applied Psychological Measurement, 2006
A statistical test for detecting answer copying on multiple-choice tests based on Cohen's kappa is proposed. The test is free of any assumptions on the response processes of the examinees suspected of copying and having served as the source, except for the usual assumption that these processes are probabilistic. Because the asymptotic null and…
Descriptors: Cheating, Test Items, Simulation, Statistical Analysis
Bedore, Lisa M.; Pena, Elizabeth D.; Garcia, Melissa; Cortez, Celina – Language, Speech, and Hearing Services in Schools, 2005
Purpose: This study evaluates the extent to which bilingual children produce the same or overlapping responses on tasks assessing semantic skills in each of their languages and whether classification analysis based on monolingual or conceptual scoring can accurately classify the semantic development of typically developing (TD) bilingual children.…
Descriptors: Monolingualism, Semantics, Skill Development, Young Children
van der Meij, Jan; de Jong, Ton – Learning and Instruction, 2006
In this study, the effects of different types of support for learning from multiple representations in a simulation-based learning environment were examined. The study extends known research by examining the use of dynamic representations instead of static representations and it examines the role of the complexity of the domain and the learning…
Descriptors: Educational Environment, Computer Assisted Instruction, Computer Simulation, Educational Technology
Liu, Xiufeng; MacIsaac, Dan – Journal of Science Education and Technology, 2005
This study investigates factors affecting the degree of novice physics students application of the naive impetus theory. Six hundred and fourteen first-year university engineering physics students answered the Force Concept Inventory as a pre-test for their calculus-based course. We examined the degree to which students consistently applied the…
Descriptors: Prediction, Familiarity, Physics, College Freshmen
Wang, Wen-Chung; Cheng, Ying-Yao; Wilson, Mark – Educational and Psychological Measurement, 2005
A parallel design, in which items across different scales within an instrument share common stimuli and subjects respond to the common stimulus for each scale, is sometimes used in questionnaires or inventories. Because the items across scales share the same stimuli, the assumption of local item independence may not hold, thereby violating the…
Descriptors: Stimuli, Psychometrics, Test Items, Item Response Theory
Peer reviewedYoung, Ellie L.; Sudweeks, Richard R. – Measurement and Evaluation in Counseling and Development, 2005
Differential item functioning (DIF) in the Multidimensional Self Concept Scale (B. A. Bracken, 1992) was evaluated using 2 different methods to identify and describe DIF. Of 149 items from the Multidimensional Self Concept Scale, 42% exhibited gender DIF. Nonuniform, crossover DIF was evident in items throughout the instrument.
Descriptors: Early Adolescents, Measures (Individuals), Self Concept, Test Items
Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items
Pommerich, Mary – Journal of Technology, Learning, and Assessment, 2004
As testing moves from paper-and-pencil administration toward computerized administration, how to present tests on a computer screen becomes an important concern. Of particular concern are tests that contain necessary information that cannot be displayed on screen all at once for an item. Ideally, the method of presentation should not interfere…
Descriptors: Test Content, Computer Assisted Testing, Multiple Choice Tests, Computer Interfaces
Powers, Robert A.; Allison, Dean E.; Grassl, Richard M. – International Journal for Technology in Mathematics Education, 2005
This study investigated the impact of the TI-92 handheld Computer Algebra System (CAS) on student achievement in a discrete mathematics course. Specifically, the researchers examined the differences between a CAS section and a control section of discrete mathematics on students' in-class examinations. Additionally, they analysed student approaches…
Descriptors: Control Groups, Mathematics Education, Test Items, Investigations
Van den Noortgate, Wim; De Boeck, Paul – Journal of Educational and Behavioral Statistics, 2005
Although differential item functioning (DIF) theory traditionally focuses on the behavior of individual items in two (or a few) specific groups, in educational measurement contexts, it is often plausible to regard the set of items as a random sample from a broader category. This article presents logistic mixed models that can be used to model…
Descriptors: Test Bias, Item Response Theory, Educational Assessment, Mathematical Models
Oliver, Renee; Williams, Robert L. – Journal of Behavioral Education, 2005
Students in four sections of an undergraduate educational course (two large and two small sections) took out-of-class practice exams prior to actual exams for each of five course units. Each course unit consisted of five class sessions focusing on a specific developmental theme. Some sections received practice-exam credit based on the number of…
Descriptors: Undergraduate Students, Predictor Variables, Contingency Management, Education Courses
von Schrader, Sarah; Ansley, Timothy – Applied Measurement in Education, 2006
Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…
Descriptors: Gender Differences, Multiple Choice Tests, Achievement Tests, Grade 3

Direct link
