NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 20 results Save | Export
Xue, Kang; Huggins-Manley, Anne Corinne; Leite, Walter – Educational and Psychological Measurement, 2022
In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of…
Descriptors: Virtual Classrooms, Artificial Intelligence, Item Response Theory, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E.; Babcock, Ben – Educational and Psychological Measurement, 2016
Continuously administered examination programs, particularly credentialing programs that require graduation from educational programs, often experience seasonality where distributions of examine ability may differ over time. Such seasonality may affect the quality of important statistical processes, such as item response theory (IRT) item…
Descriptors: Test Items, Item Response Theory, Computation, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu – Educational and Psychological Measurement, 2015
Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Descriptors: Item Response Theory, Test Format, Language Usage, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Keller, Lisa A.; Keller, Robert R. – Educational and Psychological Measurement, 2011
This article investigates the accuracy of examinee classification into performance categories and the estimation of the theta parameter for several item response theory (IRT) scaling techniques when applied to six administrations of a test. Previous research has investigated only two administrations; however, many testing programs equate tests…
Descriptors: Item Response Theory, Scaling, Sustainability, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Educational and Psychological Measurement, 2011
Standard setting is a method used to set cut scores on large-scale assessments. One of the most popular standard setting methods is the Bookmark method. In the Bookmark method, panelists are asked to envision a response probability (RP) criterion and move through a booklet of ordered items based on a RP criterion. This study investigates whether…
Descriptors: Testing Programs, Standard Setting (Scoring), Cutting Scores, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Skorupski, William P.; Carvajal, Jorge – Educational and Psychological Measurement, 2010
This study is an evaluation of the psychometric issues associated with estimating objective level scores, often referred to as "subscores." The article begins by introducing the concepts of reliability and validity for subscores from statewide achievement tests. These issues are discussed with reference to popular scaling techniques, classical…
Descriptors: Testing Programs, Test Validity, Achievement Tests, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Guemin; Lewis, Daniel M. – Educational and Psychological Measurement, 2008
The bookmark standard-setting procedure is an item response theory-based method that is widely implemented in state testing programs. This study estimates standard errors for cut scores resulting from bookmark standard settings under a generalizability theory model and investigates the effects of different universes of generalization and error…
Descriptors: Generalizability Theory, Testing Programs, Error of Measurement, Cutting Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Shudong; Jiao, Hong – Educational and Psychological Measurement, 2009
In practice, vertical scales have been continually used to measure students' achievement progress across several grade levels and have been considered very challenging psychometric procedures. Recently, such practices have been drawing many criticisms. The major criticisms focus on dimensionality and construct equivalence of the latent trait or…
Descriptors: Reading Comprehension, Elementary Secondary Education, Measures (Individuals), Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Breithaupt, Krista; Hare, Donovan R. – Educational and Psychological Measurement, 2007
Many challenges exist for high-stakes testing programs offering continuous computerized administration. The automated assembly of test questions to exactly meet content and other requirements, provide uniformity, and control item exposure can be modeled and solved by mixed-integer programming (MIP) methods. A case study of the computerized…
Descriptors: Testing Programs, Psychometrics, Certification, Accounting
Peer reviewed Peer reviewed
Riedel, James A.; Dodson, Janet D. – Educational and Psychological Measurement, 1977
GURU is a computer program developed to analyze data generated by open-ended question techniques such as ECHO or other semistructured data collection techniques in which data are categorized. The program provides extensive descriptive statistics and allows extensive flexibility in comparing data. (Author/JKS)
Descriptors: Computer Programs, Data Analysis, Essay Tests, Test Interpretation
Peer reviewed Peer reviewed
Harris, Deborah J.; Kolen, Michael J. – Educational and Psychological Measurement, 1988
Three methods of estimating point-biserial correlation coefficient standard errors were compared: (1) assuming normality; (2) not assuming normality; and (3) bootstrapping. Although errors estimated assuming normality were biased, such estimates were less variable and easier to compute, suggesting that this might be the method of choice in some…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Analysis, Statistical Analysis
Peer reviewed Peer reviewed
Cudeck, Robert A.; And Others – Educational and Psychological Measurement, 1977
TAILOR, a FORTRAN computer program for tailored testing, is described. The procedure for a joint ordering of persons and items with no pretesting as the basis for the tailored test is given, and a brief discussion of the computer program is included. (Author/JKS)
Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Programs, Test Construction
Peer reviewed Peer reviewed
McCormick, Douglas J.; Cliff, Norman – Educational and Psychological Measurement, 1977
An interactive computer program for tailored testing, called TAILOR, is presented. The program runs on the APL system. A cumulative file for each examinee is established and tests are then tailored to each examinee; extensive pretesting is not necessary. (JKS)
Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Programs, Test Construction
Peer reviewed Peer reviewed
Fan, Xitao – Educational and Psychological Measurement, 1998
This study empirically examined the behaviors of item and person statistics derived from item response theory and classical test theory, focusing on item and person statistics and using a large-scale statewide assessment. Findings show that the person and item statistics from the two measurement frameworks are quite comparable. (SLD)
Descriptors: Item Response Theory, State Programs, Statistical Analysis, Test Items
Peer reviewed Peer reviewed
Bligh, Thomas J.; Noe, Michael J. – Educational and Psychological Measurement, 1977
A computer program for scoring written simulation tests provides individual scores and basic item analysis data. The program is written in Fortran IV and can accomodate up to thirty-five hundred options and up to ten thousand examinees. (Author/JKS)
Descriptors: Computer Oriented Programs, Item Analysis, Medical Education, Problem Solving
Previous Page | Next Page ยป
Pages: 1  |  2