Publication Date
In 2025 | 8 |
Since 2024 | 17 |
Since 2021 (last 5 years) | 57 |
Since 2016 (last 10 years) | 2138 |
Since 2006 (last 20 years) | 5439 |
Descriptor
Statistical Analysis | 6044 |
Scores | 5574 |
Comparative Analysis | 2047 |
Foreign Countries | 1890 |
Correlation | 1464 |
Gender Differences | 1323 |
Elementary School Students | 1109 |
Pretests Posttests | 1009 |
Academic Achievement | 973 |
Teaching Methods | 867 |
Questionnaires | 807 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 37 |
Practitioners | 17 |
Policymakers | 13 |
Teachers | 10 |
Administrators | 8 |
Counselors | 1 |
Media Staff | 1 |
Students | 1 |
Location
Turkey | 352 |
Texas | 163 |
California | 135 |
Iran | 121 |
Australia | 91 |
Florida | 90 |
China | 80 |
North Carolina | 73 |
Taiwan | 73 |
Tennessee | 73 |
Georgia | 68 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 11 |
Meets WWC Standards with or without Reservations | 24 |
Does not meet standards | 32 |
Haeju Lee; Kyung Yong Kim – Journal of Educational Measurement, 2025
When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g.,…
Descriptors: Test Items, Equated Scores, Test Bias, Accuracy
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Jianbin Fu; TsungHan Ho; Xuan Tan – Practical Assessment, Research & Evaluation, 2025
Item parameter estimation using an item response theory (IRT) model with fixed ability estimates is useful in equating with small samples on anchor items. The current study explores the impact of three ability estimation methods (weighted likelihood estimation [WLE], maximum a posteriori [MAP], and posterior ability distribution estimation [PST])…
Descriptors: Item Response Theory, Test Items, Computation, Equated Scores
Karlson, Kristian Bernt; Popham, Frank; Holm, Anders – Sociological Methods & Research, 2023
This article presents two ways of quantifying confounding using logistic response models for binary outcomes. Drawing on the distinction between marginal and conditional odds ratios in statistics, we define two corresponding measures of confounding (marginal and conditional) that can be recovered from a simple standardization approach. We…
Descriptors: Statistical Analysis, Probability, Standards, Mediation Theory
San Martín, Ernesto; González, Jorge – Journal of Educational and Behavioral Statistics, 2022
The nonequivalent groups with anchor test (NEAT) design is widely used in test equating. Under this design, two groups of examinees are administered different test forms with each test form containing a subset of common items. Because test takers from different groups are assigned only one test form, missing score data emerge by design rendering…
Descriptors: Tests, Scores, Statistical Analysis, Models
A. R. Georgeson – Structural Equation Modeling: A Multidisciplinary Journal, 2025
There is increasing interest in using factor scores in structural equation models and there have been numerous methodological papers on the topic. Nevertheless, sum scores, which are computed from adding up item responses, continue to be ubiquitous in practice. It is therefore important to compare simulation results involving factor scores to…
Descriptors: Structural Equation Models, Scores, Factor Analysis, Statistical Bias
Wang, Yan; Kim, Eunsook; Yi, Zhiyao – Educational and Psychological Measurement, 2022
Latent profile analysis (LPA) identifies heterogeneous subgroups based on continuous indicators that represent different dimensions. It is a common practice to measure each dimension using items, create composite or factor scores for each dimension, and use these scores as indicators of profiles in LPA. In this case, measurement models for…
Descriptors: Robustness (Statistics), Profiles, Statistical Analysis, Classification
Oliver Lüdtke; Alexander Robitzsch – Journal of Experimental Education, 2025
There is a longstanding debate on whether the analysis of covariance (ANCOVA) or the change score approach is more appropriate when analyzing non-experimental longitudinal data. In this article, we use a structural modeling perspective to clarify that the ANCOVA approach is based on the assumption that all relevant covariates are measured (i.e.,…
Descriptors: Statistical Analysis, Longitudinal Studies, Error of Measurement, Hierarchical Linear Modeling
Paul T. von Hippel – Education Next, 2024
In a 1984 essay, Benjamin Bloom, an educational psychologist at the University of Chicago, asserted that tutoring offered "the best learning conditions we can devise" and that tutors could raise student achievement by two full standard deviations--or, in statistical parlance, two "sigmas." The influence of Bloom's two-sigma…
Descriptors: Tutoring, Academic Achievement, Educational Experiments, Tests
Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022
Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…
Descriptors: Reliability, Scores, Scaling, Statistical Analysis
Karun Adusumilli; Francesco Agostinelli; Emilio Borghesan – National Bureau of Economic Research, 2024
This paper examines the scalability of the results from the Tennessee Student-Teacher Achievement Ratio (STAR) Project, a prominent educational experiment. We explore how the misalignment between the experimental design and the econometric model affects researchers' ability to learn about the intervention's scalability. We document heterogeneity…
Descriptors: Class Size, Research Design, Educational Research, Program Effectiveness
Copur-Gencturk, Yasemin; Choi, Hye-Jeong; Cohen, Alan – Journal of Mathematics Teacher Education, 2023
Examining teachers' knowledge on a large scale involves addressing substantial measurement and logistical issues; thus, existing teacher knowledge assessments have mainly consisted of selected-response items because of their ease of scoring. Although open-ended responses could capture a more complex understanding of and provide further insights…
Descriptors: Mathematics Teachers, Pedagogical Content Knowledge, Statistical Analysis, Models
Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2019
We derive formulas for the differential item functioning (DIF) measures that two routinely used DIF statistics are designed to estimate. The DIF measures that match on observed scores are compared to DIF measures based on an unobserved ability (theta or true score) for items that are described by either the one-parameter logistic (1PL) or…
Descriptors: Scores, Test Bias, Statistical Analysis, Item Response Theory
Collier, Zachary K.; Leite, Walter L. – Journal of Experimental Education, 2022
Artificial neural networks (NN) can help researchers estimate propensity scores for quasi-experimental estimation of treatment effects because they can automatically detect complex interactions involving many covariates. However, NN is difficult to implement due to the complexity of choosing an algorithm for various treatment levels and monitoring…
Descriptors: Artificial Intelligence, Mentors, Beginning Teachers, Teacher Persistence
Zhang, Zhonghua – Applied Measurement in Education, 2020
The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…
Descriptors: Error of Measurement, Computation, Equated Scores, True Scores