NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 28 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Haixiang Zhang – Structural Equation Modeling: A Multidisciplinary Journal, 2025
Mediation analysis is an important statistical tool in many research fields, where the joint significance test is widely utilized for examining mediation effects. Nevertheless, the limitation of this mediation testing method stems from its conservative Type I error, which reduces its statistical power and imposes certain constraints on its…
Descriptors: Structural Equation Models, Statistical Significance, Robustness (Statistics), Comparative Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Wiliam, Dylan – Assessment in Education: Principles, Policy & Practice, 2008
While international comparisons such as those provided by PISA may be meaningful in terms of overall judgements about the performance of educational systems, caution is needed in terms of more fine-grained judgements. In particular it is argued that the results of PISA to draw conclusions about the quality of instruction in different systems is…
Descriptors: Test Bias, Test Construction, Comparative Testing, Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Do-Hong; Huynh, Huynh – Educational and Psychological Measurement, 2008
The current study compared student performance between paper-and-pencil testing (PPT) and computer-based testing (CBT) on a large-scale statewide end-of-course English examination. Analyses were conducted at both the item and test levels. The overall results suggest that scores obtained from PPT and CBT were comparable. However, at the content…
Descriptors: Reading Comprehension, Computer Assisted Testing, Factor Analysis, Comparative Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Wallach, P. M.; Crespo, L. M.; Holtzman, K. Z.; Galbraith, R. M.; Swanson, D. B. – Advances in Health Sciences Education, 2006
Purpose: In conjunction with curricular changes, a process to develop integrated examinations was implemented. Pre-established guidelines were provided favoring vignettes, clinically relevant material, and application of knowledge rather than simple recall. Questions were read aloud in a committee including all course directors, and a reviewer…
Descriptors: Test Items, Rating Scales, Examiners, Guidelines
Clauser, Brian E.; And Others – 1991
Item bias has been a major concern for test developers during recent years. The Mantel-Haenszel statistic has been among the preferred methods for identifying biased items. The statistic's performance in identifying uniform bias in simulated data modeled by producing various levels of difference in the (item difficulty) b-parameter for reference…
Descriptors: Comparative Testing, Difficulty Level, Item Bias, Item Response Theory
Peer reviewed Peer reviewed
Stocking, Martha L.; And Others – Applied Psychological Measurement, 1993
A method of automatically selecting items for inclusion in a test with constraints on item content and statistical properties was applied to real data. Tests constructed manually from the same data and constraints were compared to tests constructed automatically. Results show areas in which automated assembly can improve test construction. (SLD)
Descriptors: Algorithms, Automation, Comparative Testing, Computer Assisted Testing
Ang, Cheng; Miller, M. David – 1993
The power of the procedure of W. Stout to detect deviations from essential unidimensionality in two-dimensional data was investigated for minor, moderate, and large deviations from unidimensionality using criteria for deviations from unidimensionality based on prior research. Test lengths of 20 and 40 items and sample sizes of 700 and 1,500 were…
Descriptors: Ability, Comparative Testing, Correlation, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Squires, David; Trevisan, Michael S.; Canney, George F. – Studies in Educational Evaluation, 2006
The Idaho Comprehensive Literacy Assessment (ICLA) is a faculty-developed, state-wide, high-stakes assessment of pre-service teachers' knowledge and application of research based literacy practices. The literacy faculty control all aspects of the test, including construction, refinement, administration, scoring and reporting. The test development…
Descriptors: Test Construction, Comparative Testing, Investigations, Test Reliability
Barr, James E.; Rasor, Richard A.; Grill, Cathie – 2002
This document addresses how well ARC's computerized placement tests (Compass) assist individuals in reaching informed decisions about enrolling in selected courses, including English composition, reading, mathematics, and ESL. The document addresses the question of whether Compass scores add any relevant information in the decision-making process…
Descriptors: Academic Standards, Cognitive Processes, Community Colleges, Comparative Testing
Peer reviewed Peer reviewed
Wainer, Howard; And Others – Journal of Educational Measurement, 1992
Computer simulations were run to measure the relationship between testlet validity and factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution. (Author/SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation
Peer reviewed Peer reviewed
Crehan, Kevin D.; And Others – Educational and Psychological Measurement, 1993
Studies with 220 college students found that multiple-choice test items with 3 items are more difficult than those with 4 items, and items with the none-of-these option are more difficult than those without this option. Neither format manipulation affected item discrimination. Implications for test construction are discussed. (SLD)
Descriptors: College Students, Comparative Testing, Difficulty Level, Distractors (Tests)
Nandakumar, Ratna – 1992
The performance of the following four methodologies for assessing unidimensionality was examined: (1) DIMTEST; (2) the approach of P. W. Holland and P. R. Rosenbaum; (3) linear factor analysis; and (4) non-linear factor analysis. Each method is examined and compared with other methods using simulated data sets and real data sets. Seven data sets,…
Descriptors: Ability, Comparative Testing, Correlation, Equations (Mathematics)
Wiggins, Grant – Executive Educator, 1994
Instead of relying on standardized test scores and interdistrict comparisons, school systems must develop a more powerful, timely, and local approach to accountability that is truly client-centered and focused on results. Accountability requires giving successful teachers the freedom and opportunity to take effective ideas beyond their own…
Descriptors: Accountability, Comparative Testing, Elementary Secondary Education, Feedback
Babcock, Judith L.; And Others – 1992
This study used multiple methods to assess basic community needs and attributes of community atmosphere (cohesion, religious involvement, and recreational activities) in two psychometric studies. Part 1 revised self-report community assessment measures, developed multi-item scales for each construct, and tested reliabilities and factor structures…
Descriptors: Community Needs, Community Organizations, Community Programs, Comparative Testing
Previous Page | Next Page ยป
Pages: 1  |  2