Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 26 |
Since 2006 (last 20 years) | 47 |
Descriptor
Item Response Theory | 49 |
Statistical Analysis | 49 |
Test Reliability | 49 |
Test Validity | 24 |
Test Items | 21 |
Foreign Countries | 18 |
Psychometrics | 11 |
Correlation | 10 |
Multiple Choice Tests | 10 |
Elementary School Students | 9 |
Scores | 9 |
More ▼ |
Source
Author
Alonzo, Julie | 4 |
Irvin, P. Shawn | 4 |
Lai, Cheng-Fei | 4 |
Park, Bitnara Jasmine | 4 |
Tindal, Gerald | 4 |
Biancarosa, Gina | 2 |
Carlson, Sarah E. | 2 |
Davison, Mark L. | 2 |
Liu, Bowen | 2 |
Seipel, Ben | 2 |
Alhaythami, Hassan | 1 |
More ▼ |
Publication Type
Journal Articles | 42 |
Reports - Research | 39 |
Reports - Evaluative | 7 |
Numerical/Quantitative Data | 4 |
Reports - Descriptive | 2 |
Tests/Questionnaires | 2 |
Books | 1 |
Collected Works - General | 1 |
Education Level
Audience
Location
Germany | 4 |
Turkey | 3 |
California | 2 |
Colorado | 2 |
Arizona | 1 |
Australia | 1 |
Brazil | 1 |
Europe | 1 |
Illinois | 1 |
Indonesia | 1 |
Iran | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Trends in International… | 3 |
ACT Assessment | 1 |
California Achievement Tests | 1 |
Defining Issues Test | 1 |
Iowa Tests of Basic Skills | 1 |
SAT (College Admission Test) | 1 |
Students Evaluation of… | 1 |
What Works Clearinghouse Rating
Bashkov, Bozhidar M.; Clauser, Jerome C. – Practical Assessment, Research & Evaluation, 2019
Successful testing programs rely on high-quality test items to produce reliable scores and defensible exams. However, determining what statistical screening criteria are most appropriate to support these goals can be daunting. This study describes and demonstrates cost-benefit analysis as an empirical approach to determining appropriate screening…
Descriptors: Test Items, Test Reliability, Evaluation Criteria, Accuracy
Andersson, Björn; Xin, Tao – Educational and Psychological Measurement, 2018
In applications of item response theory (IRT), an estimate of the reliability of the ability estimates or sum scores is often reported. However, analytical expressions for the standard errors of the estimators of the reliability coefficients are not available in the literature and therefore the variability associated with the estimated reliability…
Descriptors: Item Response Theory, Test Reliability, Test Items, Scores
Smith, Tamarah; Smith, Samantha – International Journal of Teaching and Learning in Higher Education, 2018
The Research Methods Skills Assessment (RMSA) was created to measure psychology majors' statistics knowledge and skills. The American Psychological Association's Guidelines for the Undergraduate Major in Psychology (APA, 2007, 2013) served as a framework for development. Results from a Rasch analysis with data from n = 330 undergraduates showed…
Descriptors: Psychology, Statistics, Undergraduate Students, Item Response Theory
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing
Yao, Shih-Ying; Muñez, David; Bull, Rebecca; Lee, Kerry; Khng, Kiat Hui; Poon, Kenneth – Journal of Psychoeducational Assessment, 2017
The Test of Early Mathematics Ability-Third Edition (TEMA-3) is a commonly used measure of early mathematics knowledge for children aged 3 years to 8 years 11 months. In spite of its wide use, research on the psychometric properties of TEMA-3 remains limited. This study applied the Rasch model to investigate the psychometric properties of TEMA-3…
Descriptors: Foreign Countries, Mathematics Tests, Item Response Theory, Psychometrics
Longabach, Tanya; Peyton, Vicki – Language Testing, 2018
K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…
Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency
Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal – ETS Research Report Series, 2016
In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…
Descriptors: Scoring, Test Reliability, Statistical Analysis, Psychometrics
Turkan, Azmi; Cetin, Bayram – Journal of Education and Practice, 2017
Validity and reliability are among the most crucial characteristics of a test. One of the steps to make sure that a test is valid and reliable is to examine the bias in test items. The purpose of this study was to examine the bias in 2012 Placement Test items in terms of gender variable using Rasch Model in Turkey. The sample of this study was…
Descriptors: Item Response Theory, Gender Differences, Test Bias, Test Items
Walstad, William B.; Rebeck, Ken – Journal of Economic Education, 2017
The "Test of Financial Literacy" (TFL) was created to measure the financial knowledge of high school students. Its content is based on the standards and benchmarks stated in the "National Standards for Financial Literacy" (Council for Economic Education 2013). The test development process involved extensive item writing and…
Descriptors: Tests, Money Management, Literacy, High School Students
Davison, Mark L.; Biancarosa, Gina; Carlson, Sarah E.; Seipel, Ben; Liu, Bowen – Assessment for Effective Intervention, 2018
The computer-administered Multiple-Choice Online Causal Comprehension Assessment (MOCCA) for Grades 3 to 5 has an innovative, 40-item multiple-choice structure in which each distractor corresponds to a comprehension process upon which poor comprehenders have been shown to rely. This structure requires revised thinking about measurement issues…
Descriptors: Multiple Choice Tests, Computer Assisted Testing, Pilot Projects, Measurement
Davison, Mark L.; Biancarosa, Gina; Carlson, Sarah E.; Seipel, Ben; Liu, Bowen – Grantee Submission, 2018
The computer-administered Multiple-Choice Online Causal Comprehension Assessment (MOCCA) for Grades 3 to 5 has an innovative, 40-item multiple-choice structure in which each distractor corresponds to a comprehension process upon which poor comprehenders have been shown to rely. This structure requires revised thinking about measurement issues…
Descriptors: Multiple Choice Tests, Computer Assisted Testing, Pilot Projects, Measurement
Hays, Danica G.; Wood, Chris – Measurement and Evaluation in Counseling and Development, 2017
We present considerations for validity when a population outside of a normed sample is assessed and those data are interpreted. Using a career group counseling example exploring life satisfaction changes as evidenced by the Quality of Life Inventory (Frisch, 1994), we showcase qualitative and quantitative approaches to explore how normative data…
Descriptors: Data Interpretation, Scores, Quality of Life, Life Satisfaction
Chao, Jessica L.; McDermott, Paul A.; Watkins, Marley W.; Drogalis, Anna Rhoad; Worrell, Frank C.; Hall, Tracey E. – International Journal of School & Educational Psychology, 2018
This study reports on the national standardization and validation of the Learning Behaviors Scale (LBS) for use in Trinidad and Tobago. The LBS is a teacher rating scale centering on observable behaviors relevant to identifying childhood approaches to classroom learning. Teachers observed a stratified sample of 900 students across the islands'…
Descriptors: Foreign Countries, Program Validation, Behavior Rating Scales, National Standards
Fiedler, Daniela; Tröbst, Steffen; Harms, Ute – CBE - Life Sciences Education, 2017
Students of all ages face severe conceptual difficulties regarding key aspects of evolution-- the central, unifying, and overarching theme in biology. Aspects strongly related to abstract "threshold" concepts like randomness and probability appear to pose particular difficulties. A further problem is the lack of an appropriate instrument…
Descriptors: College Students, Concept Formation, Probability, Evolution
Krell, Moritz – Cogent Education, 2017
This study evaluates a 12-item instrument for subjective measurement of mental load (ML) and mental effort (ME) by analysing different sources of validity evidence. The findings of an expert judgement (N = 8) provide "evidence based on test content" that the formulation of the items corresponds to the meaning of ML and ME. An empirical…
Descriptors: Cognitive Processes, Test Validity, Secondary School Students, Multiple Choice Tests