NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers6
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 29 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
von Davier, Matthias; Bezirhan, Ummugul – Educational and Psychological Measurement, 2023
Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population…
Descriptors: Robustness (Statistics), Test Items, Item Analysis, Goodness of Fit
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2019
Lord's (1980) equity theorem claims observed-score equating to be possible only when two test forms are perfectly reliable or strictly parallel. An analysis of its proof reveals use of an incorrect statistical assumption. The assumption does not invalidate the theorem itself though, which can be shown to follow directly from the discrete nature of…
Descriptors: Equated Scores, Testing Problems, Item Response Theory, Evaluation Methods
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sánchez Sánchez, Ernesto; García Rios, Víctor N.; Silvestre Castro, Eleazar; Licea, Guadalupe Carrasco – North American Chapter of the International Group for the Psychology of Mathematics Education, 2020
In this paper, we address the following questions: What misconceptions do high school students exhibit in their first encounter with significance test problems through a repeated sampling approach? Which theory or framework could explain the presence and features of such patterns? With brief prior instruction on the use of Fathom software to…
Descriptors: High School Students, Misconceptions, Statistical Significance, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Duong, Minh Q.; Wood, Scott W. – Journal of Educational Measurement, 2017
As noted by Fremer and Olson, analysis of answer changes is often used to investigate testing irregularities because the analysis is readily performed and has proven its value in practice. Researchers such as Belov, Sinharay and Johnson, van der Linden and Jeon, van der Linden and Lewis, and Wollack, Cohen, and Eckerly have suggested several…
Descriptors: Identification, Statistics, Change, Tests
Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2018
Wollack, Cohen, and Eckerly suggested the "erasure detection index" (EDI) to detect fraudulent erasures for individual examinees. Wollack and Eckerly extended the EDI to detect fraudulent erasures at the group level. The EDI at the group level was found to be slightly conservative. This article suggests two modifications of the EDI for…
Descriptors: Deception, Identification, Testing Problems, Cheating
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2017
An increasing concern of producers of educational assessments is fraudulent behavior during the assessment (van der Linden, 2009). Benefiting from item preknowledge (e.g., Eckerly, 2017; McLeod, Lewis, & Thissen, 2003) is one type of fraudulent behavior. This article suggests two new test statistics for detecting individuals who may have…
Descriptors: Test Items, Cheating, Testing Problems, Identification
Sinharay, Sandip – Grantee Submission, 2017
Wollack, Cohen, and Eckerly (2015) suggested the "erasure detection index" (EDI) to detect fraudulent erasures for individual examinees. Wollack and Eckerly (2017) extended the EDI to detect fraudulent erasures at the group level. The EDI at the group level was found to be slightly conservative. This paper suggests two modifications of…
Descriptors: Deception, Identification, Testing Problems, Cheating
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Klufa, Jindrich – Journal on Efficiency and Responsibility in Education and Science, 2016
The paper contains an analysis of the differences of number of points in the test in mathematics between test variants, which were used in the entrance examinations at the Faculty of Business Administration at University of Economics in Prague in 2015. The differences may arise due to the varying difficulty of variants for students, but also…
Descriptors: Foreign Countries, College Students, Business Administration Education, College Entrance Examinations
Feinberg, Richard A. – ProQuest LLC, 2012
Subscores, also known as domain scores, diagnostic scores, or trait scores, can help determine test-takers' relative strengths and weaknesses and appropriately focus remediation. However, subscores often have poor psychometric properties, particularly reliability and distinctiveness (Folske, Gessaroli, & Swanson, 1999; Monaghan, 2006;…
Descriptors: Simulation, Tests, Testing, Scores
Peer reviewed Peer reviewed
Hanson, Bradley A. – Applied Measurement in Education, 1996
Determining whether score distributions differ on two or more test forms administered to samples of examinees from a single population is explored using three statistical tests using loglinear models. Examples are presented of applying tests of distribution differences to decide if equating is needed for alternative forms of a test. (SLD)
Descriptors: Equated Scores, Scoring, Statistical Distributions, Test Format
Stout, William – 1984
An important problem in psychological test theory is the development of a sound method for determining whether a test which purports to measure the level of a certain ability is, in reality, significantly contaminated by one or more other abilities displayed by persons taking the test. Because of the large number of private and governmental…
Descriptors: Latent Trait Theory, Statistical Analysis, Statistical Distributions, Test Validity
Peer reviewed Peer reviewed
Burket, George R. – Journal of Educational Measurement, 1987
This response to the Baglin paper (1986) points out the fallacy in inferring that inappropriate scaling procedures cause apparent discrepancies between medians and means and between means calculated using different units. (LMO)
Descriptors: Norm Referenced Tests, Scaling, Scoring, Statistical Distributions
Peer reviewed Peer reviewed
Walberg, Herbert J.; And Others – Review of Educational Research, 1984
This paper demonstrates the variety of positive-skew phenomena and discusses their theoretical, research, and practical implications in education. (PN)
Descriptors: Academic Achievement, Data Analysis, Research Problems, Scores
Peer reviewed Peer reviewed
Roberts, Dennis M. – Journal of Educational Measurement, 1987
This study examines a score-difference model for the detection of cheating based on the difference between two scores for an examinee: one based on the appropriate scoring key and another based on an alternative, inappropriate key. It argues that the score-difference method could falsely accuse students as cheaters. (Author/JAZ)
Descriptors: Answer Keys, Cheating, Mathematical Models, Multiple Choice Tests
Peer reviewed Peer reviewed
Huberty, Carl J. – Educational Researcher, 1987
Two approaches of statistical testing are critically reviewed. A new approach, which is a hybrid of the two, is proposed. The new approach requires the researcher to think about the two types of potential inferential errors and an explicit alternative hypothesis of interest. (VM)
Descriptors: Educational Assessment, Instruction, Multivariate Analysis, Researchers
Previous Page | Next Page »
Pages: 1  |  2