NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Educational and Psychological…59
Audience
Laws, Policies, & Programs
No Child Left Behind Act 20011
What Works Clearinghouse Rating
Showing 1 to 15 of 59 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Yusuf Kara; Akihito Kamata; Xin Qiao; Cornelis J. Potgieter; Joseph F. T. Nese – Educational and Psychological Measurement, 2024
Words read correctly per minute (WCPM) is the reporting score metric in oral reading fluency (ORF) assessments, which is popularly utilized as part of curriculum-based measurements to screen at-risk readers and to monitor progress of students who receive interventions. Just like other types of assessments with multiple forms, equating would be…
Descriptors: Oral Reading, Reading Fluency, Models, Reading Rate
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Ivy; Suesse, Thomas; Harvey, Samuel; Gu, Peter Yongqi; Fernández, Daniel; Randal, John – Educational and Psychological Measurement, 2023
The Mantel-Haenszel estimator is one of the most popular techniques for measuring differential item functioning (DIF). A generalization of this estimator is applied to the context of DIF to compare items by taking the covariance of odds ratio estimators between dependent items into account. Unlike the Item Response Theory, the method does not rely…
Descriptors: Test Bias, Computation, Statistical Analysis, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Jana Welling; Timo Gnambs; Claus H. Carstensen – Educational and Psychological Measurement, 2024
Disengaged responding poses a severe threat to the validity of educational large-scale assessments, because item responses from unmotivated test-takers do not reflect their actual ability. Existing identification approaches rely primarily on item response times, which bears the risk of misclassifying fast engaged or slow disengaged responses.…
Descriptors: Foreign Countries, College Students, Guessing (Tests), Multiple Choice Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Educational and Psychological Measurement, 2021
An essential question when computing test--retest and alternate forms reliability coefficients is how many days there should be between tests. This article uses data from reading and math computerized adaptive tests to explore how the number of days between tests impacts alternate forms reliability coefficients. Results suggest that the highest…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Reliability, Reading Tests
Peer reviewed Peer reviewed
Direct linkDirect link
van Dijk, Wilhelmina; Schatschneider, Christopher; Al Otaiba, Stephanie; Hart, Sara A. – Educational and Psychological Measurement, 2022
Complex research questions often need large samples to obtain accurate estimates of parameters and adequate power. Combining extant data sets into a large, pooled data set is one way this can be accomplished without expending resources. Measurement invariance (MI) modeling is an established approach to ensure participant scores are on the same…
Descriptors: Sample Size, Data Analysis, Goodness of Fit, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Andrich, David; Marais, Ida; Humphry, Stephen Mark – Educational and Psychological Measurement, 2016
Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The…
Descriptors: Guessing (Tests), Statistical Bias, Item Response Theory, Multiple Choice Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Sideridis, Georgios D. – Educational and Psychological Measurement, 2016
The purpose of the present studies was to test the hypothesis that the psychometric characteristics of ability scales may be significantly distorted if one accounts for emotional factors during test taking. Specifically, the present studies evaluate the effects of anxiety and motivation on the item difficulties of the Rasch model. In Study 1, the…
Descriptors: Learning Disabilities, Test Validity, Measures (Individuals), Hierarchical Linear Modeling
Peer reviewed Peer reviewed
Direct linkDirect link
Lin, Pei-Ying; Lin, Yu-Cheng – Educational and Psychological Measurement, 2014
This exploratory study investigated potential sources of setting accommodation resulting in differential item functioning (DIF) on math and reading assessments for examinees with varied learning characteristics. The examinees were those who participated in large-scale assessments and were tested in either standardized or accommodated testing…
Descriptors: Test Bias, Multivariate Analysis, Testing Accommodations, Mathematics Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wiley, Edward W.; Shavelson, Richard J.; Kurpius, Amy A. – Educational and Psychological Measurement, 2014
The name "SAT" has become synonymous with college admissions testing; it has been dubbed "the gold standard." Numerous studies on its reliability and predictive validity show that the SAT predicts college performance beyond high school grade point average. Surprisingly, studies of the factorial structure of the current version…
Descriptors: College Readiness, College Admission, College Entrance Examinations, Factor Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Pohl, Steffi; Gräfe, Linda; Rose, Norman – Educational and Psychological Measurement, 2014
Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Descriptors: Test Items, Achievement Tests, Item Response Theory, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Hartig, Johannes; Frey, Andreas; Nold, Gunter; Klieme, Eckhard – Educational and Psychological Measurement, 2012
The article compares three different methods to estimate effects of task characteristics and to use these estimates for model-based proficiency scaling: prediction of item difficulties from the Rasch model, the linear logistic test model (LLTM), and an LLTM including random item effects (LLTM+e). The methods are applied to empirical data from a…
Descriptors: Item Response Theory, Models, Methods, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Skaggs, Gary; Hein, Serge F. – Educational and Psychological Measurement, 2011
Judgmental standard setting methods have been criticized for the cognitive complexity of the judgment task that panelists are asked to complete. This study compared two methods designed to reduce this complexity: the yes/no method and the single-passage bookmark method. Two mock standard setting panel meetings were convened, one for each method,…
Descriptors: Standard Setting (Scoring), Methods, Cutting Scores, Experienced Teachers
Peer reviewed Peer reviewed
Direct linkDirect link
Humphry, Stephen M. – Educational and Psychological Measurement, 2010
Discrimination has traditionally been parameterized for items but not other empirical factors. Consequently, if person factors affect discrimination they cause misfit. However, by explicitly formulating the relationship between discrimination and the unit of a metric, it is possible to parameterize discrimination for person groups. This article…
Descriptors: Discriminant Analysis, Models, Simulation, Reading Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Educational and Psychological Measurement, 2011
Standard setting is a method used to set cut scores on large-scale assessments. One of the most popular standard setting methods is the Bookmark method. In the Bookmark method, panelists are asked to envision a response probability (RP) criterion and move through a booklet of ordered items based on a RP criterion. This study investigates whether…
Descriptors: Testing Programs, Standard Setting (Scoring), Cutting Scores, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Ferdous, Abdullah A.; Plake, Barbara S. – Educational and Psychological Measurement, 2007
In an Angoff standard setting procedure, judges estimate the probability that a hypothetical randomly selected minimally competent candidate will answer correctly each item in the test. In many cases, these item performance estimates are made twice, with information shared with the panelists between estimates. Especially for long tests, this…
Descriptors: Test Items, Probability, Item Analysis, Standard Setting (Scoring)
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4