NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 3,886 to 3,900 of 9,552 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Bax, Stephen – Language Testing, 2013
The research described in this article investigates test takers' cognitive processing while completing onscreen IELTS (International English Language Testing System) reading test items. The research aims, among other things, to contribute to our ability to evaluate the cognitive validity of reading test items (Glaser, 1991; Field, in press). The…
Descriptors: Reading Tests, Eye Movements, Cognitive Processes, Language Tests
He, Wei – ProQuest LLC, 2010
Item pool quality has been regarded as one important factor to help realize enhanced measurement quality for the computerized adaptive test (CAT) (e.g., Flaugher, 2000; Jensema, 1977; McBride & Wise, 1976; Reckase, 1976; 2003; van der Linden, Ariel, & Veldkamp, 2006; Veldkamp & van der Linden, 2000; Xing & Hambleton, 2004). However, studies are…
Descriptors: Test Items, Computer Assisted Testing, Item Analysis, Test Construction
Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam – Educational Testing Service, 2010
This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…
Descriptors: Equated Scores, Scoring, Responses, Test Items
DeCarlo, Lawrence T. – Educational Testing Service, 2010
A basic consideration in large-scale assessments that use constructed response (CR) items, such as essays, is how to allocate the essays to the raters that score them. Designs that are used in practice are incomplete, in that each essay is scored by only a subset of the raters, and also unbalanced, in that the number of essays scored by each rater…
Descriptors: Test Items, Responses, Essay Tests, Scoring
Adams, Ray; Berezner, Alla; Jakubowski, Maciej – OECD Publishing (NJ1), 2010
This paper uses an approximate average percent-correct methodology to compare the ranks that would be obtained for PISA 2006 countries if the rankings had been derived from items judged by each country to be of highest priority for inclusion. The results reported show a remarkable consistency in the country rank orderings across different sets of…
Descriptors: Science Tests, Preferences, Test Items, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Buckendahl, Chad W.; Ferdous, Abdullah A.; Gerrow, Jack – Practical Assessment, Research & Evaluation, 2010
Many testing programs face the practical challenge of having limited resources to conduct comprehensive standard setting studies. Some researchers have suggested that replicating a group's recommended cut score on a full-length test may be possible by using a subset of the items. However, these studies were based on simulated data. This study…
Descriptors: Cutting Scores, Test Items, Standard Setting (Scoring), Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Haberman, Shelby J.; Zwick, Rebecca – Measurement: Interdisciplinary Research and Perspectives, 2010
Several researchers (e.g., Klein, Hamilton, McCaffrey, & Stecher, 2000; Koretz & Barron, 1998; Linn, 2000) have asserted that test-based accountability, a crucial component of U.S. education policy, has resulted in score inflation. This inference has relied on comparisons with performance on other tests such as the National Assessment of…
Descriptors: Audits (Verification), Test Items, Scores, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Briggs, Derek C. – Measurement: Interdisciplinary Research and Perspectives, 2010
The use of large-scale assessments for making high stakes inferences about students and the schools in which they are situated is premised on the assumption that tests are sensitive to good instruction. An increase in the quality of classroom instruction should cause, on the average, an increase in test scores. In work with a number of colleagues…
Descriptors: Measurement, High Stakes Tests, Inferences, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Raker, Jeffrey R.; Towns, Marcy H. – Chemistry Education Research and Practice, 2010
Investigations of the problem types used in college-level general chemistry examinations have been reported in this Journal and were first reported in the "Journal of Chemical Education" in 1924. This study extends the findings from general chemistry to the problems of four college-level organic chemistry courses. Three problem…
Descriptors: Benchmarking, Organic Chemistry, Science Instruction, College Science
Peer reviewed Peer reviewed
Direct linkDirect link
Ip, Edward H. – Applied Psychological Measurement, 2010
The testlet response model is designed for handling items that are clustered, such as those embedded within the same reading passage. Although the testlet is a powerful tool for handling item clusters in educational and psychological testing, the interpretations of its item parameters, the conditional correlation between item pairs, and the…
Descriptors: Item Response Theory, Models, Test Items, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Livingston, Samuel A. – Journal of Educational Measurement, 2010
Score equating based on small samples of examinees is often inaccurate for the examinee populations. We conducted a series of resampling studies to investigate the accuracy of five methods of equating in a common-item design. The methods were chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating,…
Descriptors: Equated Scores, Test Items, Item Sampling, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Revuelta, Javier – Psychometrika, 2010
A comprehensive analysis of difficulty for multiple-choice items requires information at different levels: the test, the items, and the alternatives. This paper introduces a new parameterization of the nominal categories model (NCM) for analyzing difficulty at these three levels. The new parameterization is referred to as the NE-NCM and is…
Descriptors: Classification, Short Term Memory, Multiple Choice Tests, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Haberman, Shelby J.; Sinharay, Sandip – Psychometrika, 2010
Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in "Appl. Psychol. Meas." 21:25-36, 1997; C.R. Rao and S. Sinharay (Eds), "Handbook of Statistics, vol. 26," pp. 607-642, North-Holland, Amsterdam, 2007; Beguin &…
Descriptors: Item Response Theory, Psychometrics, Statistical Analysis, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Hooker, Giles; Finkelman, Matthew – Psychometrika, 2010
Hooker, Finkelman, and Schwartzman ("Psychometrika," 2009, in press) defined a paradoxical result as the attainment of a higher test score by changing answers from correct to incorrect and demonstrated that such results are unavoidable for maximum likelihood estimates in multidimensional item response theory. The potential for these results to…
Descriptors: Models, Scores, Item Response Theory, Psychometrics
Altun, Halis; Korkmaz, Özgen – Online Submission, 2012
The aim of this study is to adapt the Cooperative Learning Attitude Scale into Turkish and determine engineering students' attitudes towards the cooperative learning. The study is based on the descriptive scanning model. The study group consists of 466 engineering students. The validity of the scale is confirmed through exploration factor analysis…
Descriptors: Foreign Countries, Cooperative Learning, Attitude Measures, Engineering Education
Pages: 1  |  ...  |  256  |  257  |  258  |  259  |  260  |  261  |  262  |  263  |  264  |  ...  |  637