Publication Date
In 2025 | 39 |
Since 2024 | 192 |
Since 2021 (last 5 years) | 495 |
Since 2016 (last 10 years) | 996 |
Since 2006 (last 20 years) | 2028 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Researchers | 93 |
Practitioners | 23 |
Teachers | 22 |
Policymakers | 10 |
Administrators | 5 |
Students | 4 |
Counselors | 2 |
Parents | 2 |
Community | 1 |
Location
United States | 47 |
Germany | 42 |
Australia | 34 |
Canada | 27 |
Turkey | 27 |
California | 22 |
United Kingdom (England) | 20 |
Netherlands | 18 |
China | 16 |
New York | 15 |
United Kingdom | 15 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Does not meet standards | 1 |
Andrews, Benjamin James – ProQuest LLC, 2011
The equity properties can be used to assess the quality of an equating. The degree to which expected scores conditional on ability are similar between test forms is referred to as first-order equity. Second-order equity is the degree to which conditional standard errors of measurement are similar between test forms after equating. The purpose of…
Descriptors: Test Format, Advanced Placement, Simulation, True Scores
Chamberlain, Suzanne – Research Papers in Education, 2013
This paper presents the findings of a study designed to explore qualification users' perceptions and experiences of reliability in the context of national assessment outcomes in England. The study consisted of 17 focus groups conducted across six sectors of qualification users: students, teachers, trainee teachers, job-seekers, employers and…
Descriptors: Qualifications, Test Reliability, Foreign Countries, Focus Groups
Friedman-Krauss, Allison H.; Connors, Maia C.; Morris, Pamela A. – Society for Research on Educational Effectiveness, 2013
As a result of the 1998 reauthorization of Head Start, the Department of Health and Human Services conducted a national evaluation of the Head Start program. The goal of Head Start is to improve the school readiness skills of low-income children in the United States. There is a substantial body of experimental and correlational research that has…
Descriptors: Early Intervention, Preschool Education, School Readiness, Low Income Groups
Kierulff, Herbert – American Journal of Business Education, 2012
Over the past 60 years the internal rate of return (IRR) has become a major tool in investment evaluation. Many executives prefer it to net present value (NPV), presumably because they can more easily comprehend a percentage measure. This article demonstrates that, except in the rare case of an investment that is followed by a single cash return,…
Descriptors: Outcomes of Education, Measurement Techniques, Outcome Measures, Definitions
Carter, Benjamin Hammond – ProQuest LLC, 2012
The factor structure of posttraumatic stress disorder (PTSD) remains the subject of intense investigation. The DSM three-factor conceptualization of PTSD has not been empirically supported; rather, two four-factor models of PTSD (King, Leskin, King, & Weathers, 1998; Simms, Watson, & Doebbeling, 2002) have garnered the majority of support…
Descriptors: Factor Structure, Posttraumatic Stress Disorder, Trauma, Symptoms (Individual Disorders)
Yao, Lihua – Psychometrika, 2012
Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…
Descriptors: Item Banks, Test Length, Simulation, Adaptive Testing
In'nami, Yo; Koizumi, Rie – Language Testing, 2012
This study examined the factor structure of the listening and reading sections of the revised Test of English for International Communication (TOEIC[R]) test. The data from the TOEIC IP (institutional program) test taken by 569 English learners were randomly split into two samples (n = 285 vs. 284). Four models (higher-order, correlated,…
Descriptors: Communication (Thought Transfer), Second Language Learning, Factor Structure, Measurement
Han, Kyung T. – Practical Assessment, Research & Evaluation, 2012
For several decades, the "three-parameter logistic model" (3PLM) has been the dominant choice for practitioners in the field of educational measurement for modeling examinees' response data from multiple-choice (MC) items. Past studies, however, have pointed out that the c-parameter of 3PLM should not be interpreted as a guessing…
Descriptors: Statistical Analysis, Models, Multiple Choice Tests, Guessing (Tests)
Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2009
We derive an estimator of the standardized value which, under the standard assumptions of normality and homoscedasticity, is more efficient than the established (asymptotically efficient) estimator and discuss its gains for small samples. (Contains 1 table and 3 figures.)
Descriptors: Efficiency, Computation, Statistics, Sample Size
Sijtsma, Klaas – Psychometrika, 2009
This discussion paper argues that both the use of Cronbach's alpha as a reliability estimate and as a measure of internal consistency suffer from major problems. First, alpha always has a value, which cannot be equal to the test score's reliability given the inter-item covariance matrix and the usual assumptions about measurement error. Second, in…
Descriptors: Measurement, Error of Measurement, Scores, Computation
New York State Education Department, 2015
This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 Common Core English Language Arts (ELA) and Mathematics 2015 Operational Tests. This report includes information about test content and test development, item (i.e.,…
Descriptors: Testing Programs, English, Language Arts, Mathematics Tests
Carnoy, Martin – National Education Policy Center, 2015
Stanford education professor Martin Carnoy examines four main critiques of how international test results are used in policymaking. Of particular interest are critiques of the policy analyses published by the Program for International Student Assessment (PISA). Using average PISA scores as a comparative measure of student achievement is misleading…
Descriptors: Criticism, Reputation, Test Validity, Error of Measurement
Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam – Educational Testing Service, 2010
This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…
Descriptors: Equated Scores, Scoring, Responses, Test Items
Lengh, Carolyn J. – ProQuest LLC, 2010
This study compares the dependability of four classroom assessment scoring methods. Generalizability theory (G) and alternative decision (D) are used to measure the results of students' classroom assessment scores and compare the results of the four scoring methods on variability of rater by person variance and the level of G and D coefficients…
Descriptors: Generalizability Theory, Scoring, Social Studies, Tests
Yang, Manshu; Chow, Sy-Miin – Psychometrika, 2010
Facial electromyography (EMG) is a useful physiological measure for detecting subtle affective changes in real time. A time series of EMG data contains bursts of electrical activity that increase in magnitude when the pertinent facial muscles are activated. Whereas previous methods for detecting EMG activation are often based on deterministic or…
Descriptors: Test Bias, Error of Measurement, Human Body, Diagnostic Tests