NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 166 to 180 of 3,974 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L. – Educational Measurement: Issues and Practice, 2017
The rise of computer-based testing has brought with it the capability to measure more aspects of a test event than simply the answers selected or constructed by the test taker. One behavior that has drawn much research interest is the time test takers spend responding to individual multiple-choice items. In particular, very short response…
Descriptors: Guessing (Tests), Multiple Choice Tests, Test Items, Reaction Time
Chamoy, Waritsa – ProQuest LLC, 2018
The main purpose of this study was to conduct a validation analysis of student surveys of teaching effectiveness implemented at Bangkok University, Thailand. This study included three phases; survey development, a pilot study, and a full implementation study. Four sources of validity evidence were collected to support intended interpretations and…
Descriptors: Foreign Countries, Psychometrics, Student Surveys, College Students
Peer reviewed Peer reviewed
Direct linkDirect link
Willse, John T. – Measurement and Evaluation in Counseling and Development, 2017
This article provides a brief introduction to the Rasch model. Motivation for using Rasch analyses is provided. Important Rasch model concepts and key aspects of result interpretation are introduced, with major points reinforced using a simulation demonstration. Concrete guidelines are provided regarding sample size and the evaluation of items.
Descriptors: Item Response Theory, Test Results, Test Interpretation, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Faulkner-Bond, Molly; Wolf, Mikyung Kim; Wells, Craig S.; Sireci, Stephen G. – Language Assessment Quarterly, 2018
In this study we investigated the internal factor structure of a large-scale K--12 assessment of English language proficiency (ELP) using samples of fourth- and eighth-grade English learners (ELs) in one state. While U.S. schools are mandated to measure students' ELP in four language domains (listening, reading, speaking, and writing), some ELP…
Descriptors: Factor Structure, Language Tests, Language Proficiency, Grade 4
Hopfenbeck, Therese N.; Lenkeit, Jenny – International Association for the Evaluation of Educational Achievement, 2018
International large-scale assessments (ILSAs) have had an increasing influence on the discourse surrounding education systems around the world. However, the results of these studies tend to have less impact on pedagogy in the classroom than would be expected. For example, a recent review of 114 published peer-reviewed articles on the IEA's…
Descriptors: Foreign Countries, Achievement Tests, Grade 4, Reading Achievement
Peer reviewed Peer reviewed
Direct linkDirect link
Cui, Ying; Gierl, Mark; Guo, Qi – Educational Psychology, 2016
The purpose of the current investigation was to describe how the artificial neural networks (ANNs) can be used to interpret student performance on cognitive diagnostic assessments (CDAs) and evaluate the performances of ANNs using simulation results. CDAs are designed to measure student performance on problem-solving tasks and provide useful…
Descriptors: Cognitive Tests, Diagnostic Tests, Classification, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Hidalgo, Ma Dolores; Benítez, Isabel; Padilla, Jose-Luis; Gómez-Benito, Juana – Sociological Methods & Research, 2017
The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A…
Descriptors: Test Items, Test Bias, Item Response Theory, Surveys
Peer reviewed Peer reviewed
Direct linkDirect link
Oliveri, Maria; McCaffrey, Daniel; Ezzo, Chelsea; Holtzman, Steven – Applied Measurement in Education, 2017
The assessment of noncognitive traits is challenging due to possible response biases, "subjectivity" and "faking." Standardized third-party evaluations where an external evaluator rates an applicant on their strengths and weaknesses on various noncognitive traits are a promising alternative. However, accurate score-based…
Descriptors: Factor Analysis, Decision Making, College Admission, Likert Scales
Peer reviewed Peer reviewed
Direct linkDirect link
Zapata-Rivera, Juan Diego; Katz, Irvin R. – Assessment in Education: Principles, Policy & Practice, 2014
Score reports have one or more intended audiences: the people who use the reports to make decisions about test takers, including teachers, administrators, parents and test takers. Attention to audience when designing a score report supports assessment validity by increasing the likelihood that score users will interpret and use assessment results…
Descriptors: Audience Analysis, Scores, Reports, Test Interpretation
Peer reviewed Peer reviewed
Direct linkDirect link
He, Qingping; Stockford, Ian; Meadows, Michelle – Oxford Review of Education, 2018
Results from Rasch analysis of GCSE and GCE A level data over a period of four years suggest that the standards of examinations in different subjects are not consistent in terms of the levels of the latent trait specified in the Rasch model required to achieve the same grades. Variability in statistical standards between subjects exists at both…
Descriptors: Foreign Countries, Exit Examinations, Intellectual Disciplines, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Hua, Anh N.; Keenan, Janice M. – Scientific Studies of Reading, 2017
One of the most important findings to emerge from recent reading comprehension research is that there are large differences between tests in what they assess--specifically, the extent to which performance depends on word recognition versus listening comprehension skills. Because this research used ordinary least squares regression, it is not clear…
Descriptors: Reading Comprehension, Reading Tests, Test Interpretation, Regression (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
Newton, Paul E. – Journal of Educational Measurement, 2013
Kane distinguishes between two kinds of argument: the interpretation/use argument and the validity argument. This commentary considers whether there really are two kinds of argument, two arguments, or just one. It concludes that there is just one argument: the validity argument. (Contains 2 figures and 5 notes.)
Descriptors: Validity, Test Interpretation, Test Use
Talan, Teri N.; Bloom, Paula Jorde – Teachers College Press, 2018
The "Business Administration Scale for Family Child Care" (BAS) is the first valid and reliable tool for measuring and improving the overall quality of business and professional practices in family child care settings. It is applicable for multiple uses, including program self-improvement, technical assistance and monitoring, training,…
Descriptors: Business Administration, Child Care, Rating Scales, Qualifications
Peer reviewed Peer reviewed
Direct linkDirect link
Popham, W. James – Educational Leadership, 2014
Fifty years ago, Robert Glaser introduced the concept of criterion-referenced measurement in an article in American Psychologist. Its early proponents predicted that this measurement strategy would revolutionize education. But has it lived up to its promise? W. James Popham explores this question by looking at the history of criterion-referenced…
Descriptors: Criterion Referenced Tests, Program Effectiveness, Misconceptions, Test Interpretation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Monroe, Scott; Cai, Li – Grantee Submission, 2015
This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to Structural Equation Modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well…
Descriptors: Structural Equation Models, Test Interpretation, Goodness of Fit, Item Response Theory
Pages: 1  |  ...  |  8  |  9  |  10  |  11  |  12  |  13  |  14  |  15  |  16  |  ...  |  265