Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 5 |
Descriptor
Probability | 16 |
Testing Problems | 16 |
Statistical Analysis | 5 |
Mathematical Models | 4 |
Evaluation Methods | 3 |
Item Response Theory | 3 |
Multiple Choice Tests | 3 |
Scores | 3 |
Test Construction | 3 |
Test Items | 3 |
Test Validity | 3 |
More ▼ |
Source
Author
Wilcox, Rand R. | 3 |
An, Chen | 1 |
Braun, Henry | 1 |
Camilla Rjosk | 1 |
Chen, Yuguo | 1 |
Choi, Seung W. | 1 |
Choppin, Bruce | 1 |
Dirkzwager, A. | 1 |
Green, D. R. | 1 |
Haberman, Shelby J. | 1 |
Hambleton, Ronald K. | 1 |
More ▼ |
Publication Type
Reports - Research | 16 |
Journal Articles | 9 |
Collected Works - General | 1 |
Reports - Evaluative | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Middle Schools | 1 |
Audience
Location
Germany | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Indiana Statewide Testing for… | 1 |
What Works Clearinghouse Rating
Karoline A. Sachse; Sebastian Weirich; Nicole Mahler; Camilla Rjosk – International Journal of Testing, 2024
In order to ensure content validity by covering a broad range of content domains, the testing times of some educational large-scale assessments last up to a total of two hours or more. Performance decline over the course of taking the test has been extensively documented in the literature. It can occur due to increases in the numbers of: (a)…
Descriptors: Test Wiseness, Test Score Decline, Testing Problems, Foreign Countries
Haberman, Shelby J.; Lee, Yi-Hsuan – ETS Research Report Series, 2017
In investigations of unusual testing behavior, a common question is whether a specific pattern of responses occurs unusually often within a group of examinees. In many current tests, modern communication techniques can permit quite large numbers of examinees to share keys, or common response patterns, to the entire test. To address this issue,…
Descriptors: Student Evaluation, Testing, Item Response Theory, Maximum Likelihood Statistics
An, Chen; Braun, Henry; Walsh, Mary E. – Educational Measurement: Issues and Practice, 2018
Making causal inferences from a quasi-experiment is difficult. Sensitivity analysis approaches to address hidden selection bias thus have gained popularity. This study serves as an introduction to a simple but practical form of sensitivity analysis using Monte Carlo simulation procedures. We examine estimated treatment effects for a school-based…
Descriptors: Statistical Inference, Intervention, Program Effectiveness, Quasiexperimental Design
Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014
With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…
Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Chen, Yuguo; Small, Dylan – Psychometrika, 2005
Rasch proposed an exact conditional inference approach to testing his model but never implemented it because it involves the calculation of a complicated probability. This paper furthers Rasch's approach by (1) providing an efficient Monte Carlo methodology for accurately approximating the required probability and (2) illustrating the usefulness…
Descriptors: Testing Problems, Probability, Methods, Testing
Wilcox, Rand R. – 1979
Three separate papers are included in this report. The first describes a two-stage procedure for choosing from among several instructional programs the one which maximizes the probability of passing the test. The second gives the exact sample sizes required to determine whether a squared multiple correlation coefficient is above or below a known…
Descriptors: Bayesian Statistics, Correlation, Hypothesis Testing, Mathematical Models

Wilcox, Rand R. – Educational and Psychological Measurement, 1979
A problem of considerable importance in certain educational settings is determining how many items to include on a mastery test. Applying ranking and selection procedures, a solution is given which includes as a special case all existing single-stage, non-Bayesian solutions based on a strong true-score model. (Author/JKS)
Descriptors: Criterion Referenced Tests, Mastery Tests, Nonparametric Statistics, Probability
Petersen, Nancy S.; Novick, Melvin R. – 1975
Models proposed by Cleary, Thorndike, Cole, Linn, Einhorn and Bass, Darlington, and Gross and Su for analyzing bias in the use of tests in a selection strategy are surveyed. Several additional models are also introduced. The purpose is to describe, compare, contrast, and evaluate these models while extracting such useful ideas as may be found in…
Descriptors: Comparative Analysis, Culture Fair Tests, Models, Personnel Selection

Green, D. R.; Tomlinson, M. – Journal of Research in Reading, 1983
Confirms that in cloze testing, it is unnecessary to use standard size spaces and reveals a high correlation between synonymic scoring and verbatim scoring. Indicates also that a specific probability concepts test is comprehensible and readable by the great majority of students for whom it was devised. (FL)
Descriptors: Cloze Procedure, Elementary Secondary Education, Listening Skills, Probability
Suhadolnik, Debra; Weiss, David J. – 1983
The present study was an attempt to alleviate some of the difficulties inherent in multiple-choice items by having examinees respond to multiple-choice items in a probabilistic manner. Using this format, examinees are able to respond to each alternative and to provide indications of any partial knowledge they may possess concerning the item. The…
Descriptors: Confidence Testing, Multiple Choice Tests, Probability, Response Style (Tests)
Wilcox, Rand R. – 1978
Two fundamental problems in mental test theory are to estimate true score and to estimate the amount of error when testing an examinee. In this report, three probability models which characterize a single test item in terms of a population of examinees are described. How these models may be modified to characterize a single examinee in terms of an…
Descriptors: Achievement Tests, Comparative Analysis, Error of Measurement, Mathematical Models

Dirkzwager, A. – Educational and Psychological Measurement, 1996
Testing with personal probabilities eliminates guessing whether the subjects are well calibrated. A probability testing study with 47 Dutch elementary school children who used an interactive computer program shows that even 11-year-olds can estimate their personal probabilities correctly. (SLD)
Descriptors: Computer Assisted Testing, Elementary Education, Elementary School Students, Estimation (Mathematics)
Choppin, Bruce – 1982
On well-constructed multiple-choice tests, the most serious threat to measurement is not variation in item discrimination, but the guessing behavior that may be adopted by some students. Ways of ameliorating the effects of guessing are discussed, especially for problems in latent trait models. A new item response model, including an item parameter…
Descriptors: Ability, Algorithms, Guessing (Tests), Item Analysis
Madsen, Harold S. – 1987
A study investigated the effectiveness of the Rasch procedure in measuring response appropriateness, especially for the detection of cheating on multiple-choice language tests. The report gives background information on appropriateness measurement and its potential uses, reviews recent research on cheating and its detection, and describes three…
Descriptors: Cheating, English (Second Language), Evaluation Methods, Language Tests
Previous Page | Next Page ยป
Pages: 1 | 2