Publication Date
In 2025 | 1 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 9 |
Since 2016 (last 10 years) | 46 |
Since 2006 (last 20 years) | 144 |
Descriptor
Statistical Analysis | 212 |
Test Construction | 47 |
Evaluation Methods | 38 |
Test Items | 37 |
Foreign Countries | 34 |
Scores | 30 |
Models | 26 |
Test Reliability | 25 |
Test Validity | 25 |
Tests | 25 |
Academic Achievement | 23 |
More ▼ |
Source
Author
Raykov, Tenko | 8 |
Marcoulides, George A. | 4 |
Zumbo, Bruno D. | 3 |
Andrich, David | 2 |
Coe, Robert | 2 |
Doran, Harold C. | 2 |
Drummond, Gordon B. | 2 |
Gierl, Mark J. | 2 |
Rothstein, Richard | 2 |
Vowler, Sarah L. | 2 |
van der Linden, Wim J. | 2 |
More ▼ |
Publication Type
Education Level
Higher Education | 40 |
Elementary Secondary Education | 23 |
Postsecondary Education | 20 |
Elementary Education | 16 |
Secondary Education | 13 |
Grade 8 | 8 |
High Schools | 8 |
Middle Schools | 8 |
Grade 3 | 5 |
Grade 4 | 5 |
Grade 5 | 4 |
More ▼ |
Audience
Researchers | 6 |
Practitioners | 4 |
Teachers | 4 |
Policymakers | 2 |
Location
Australia | 4 |
Michigan | 4 |
United Kingdom | 4 |
California | 3 |
New York | 3 |
North Carolina | 3 |
Texas | 3 |
United States | 3 |
Brazil | 2 |
Canada | 2 |
Indiana | 2 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 4 |
Individuals with Disabilities… | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
He, Qingping; Meadows, Michelle; Black, Beth – Research Papers in Education, 2022
A potential negative consequence of high-stakes testing is inappropriate test behaviour involving individuals and/or institutions. Inappropriate test behaviour and test collusion can result in aberrant response patterns and anomalous test scores and invalidate the intended interpretation and use of test results. A variety of statistical techniques…
Descriptors: Statistical Analysis, High Stakes Tests, Scores, Response Style (Tests)
Mark Wilson – Journal of Educational and Behavioral Statistics, 2024
This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom). The first level is the usual context for educational…
Descriptors: Educational Assessment, Measurement, Standardized Tests, Test Items
Marc Brysbaert – Cognitive Research: Principles and Implications, 2024
Experimental psychology is witnessing an increase in research on individual differences, which requires the development of new tasks that can reliably assess variations among participants. To do this, cognitive researchers need statistical methods that many researchers have not learned during their training. The lack of expertise can pose…
Descriptors: Experimental Psychology, Individual Differences, Statistical Analysis, Task Analysis
Zheng, Xiaying; Yang, Ji Seung – Measurement: Interdisciplinary Research and Perspectives, 2021
The purpose of this paper is to briefly introduce two most common applications of multiple group item response theory (IRT) models, namely detecting differential item functioning (DIF) analysis and nonequivalent group score linking with a simultaneous calibration. We illustrate how to conduct those analyses using the "Stata" item…
Descriptors: Item Response Theory, Test Bias, Computer Software, Statistical Analysis
San Martín, Ernesto; González, Jorge – Journal of Educational and Behavioral Statistics, 2022
The nonequivalent groups with anchor test (NEAT) design is widely used in test equating. Under this design, two groups of examinees are administered different test forms with each test form containing a subset of common items. Because test takers from different groups are assigned only one test form, missing score data emerge by design rendering…
Descriptors: Tests, Scores, Statistical Analysis, Models
Sengül Avsar, Asiye – Measurement: Interdisciplinary Research and Perspectives, 2020
In order to reach valid and reliable test scores, various test theories have been developed, and one of them is nonparametric item response theory (NIRT). Mokken Models are the most widely known NIRT models which are useful for small samples and short tests. Mokken Package is useful for Mokken Scale Analysis. An important issue about validity is…
Descriptors: Response Style (Tests), Nonparametric Statistics, Item Response Theory, Test Validity
Pentimonti, J.; Petscher, Y.; Stanley, C. – National Center on Improving Literacy, 2019
Sample representativeness is an important piece to consider when evaluating the quality of a screening assessment. If you are trying to determine whether or not the screening tool accurately measures children's skills, you want to ensure that the sample that is used to validate the tool is representative of your population of interest.
Descriptors: Sampling, Screening Tests, Measurement, Test Validity
Zumbo, Bruno D.; Kroc, Edward – Educational and Psychological Measurement, 2019
Chalmers recently published a critique of the use of ordinal a[alpha] proposed in Zumbo et al. as a measure of test reliability in certain research settings. In this response, we take up the task of refuting Chalmers' critique. We identify three broad misconceptions that characterize Chalmers' criticisms: (1) confusing assumptions with…
Descriptors: Test Reliability, Statistical Analysis, Misconceptions, Mathematical Models
Liu, Ren – Educational and Psychological Measurement, 2018
Attribute structure is an explicit way of presenting the relationship between attributes in diagnostic measurement. The specification of attribute structures directly affects the classification accuracy resulted from psychometric modeling. This study provides a conceptual framework for understanding misspecifications of attribute structures. Under…
Descriptors: Diagnostic Tests, Classification, Test Construction, Relationship
Harring, Jeffrey R.; Johnson, Tessa L. – Educational Measurement: Issues and Practice, 2020
In this digital ITEMS module, Dr. Jeffrey Harring and Ms. Tessa Johnson introduce the linear mixed effects (LME) model as a flexible general framework for simultaneously modeling continuous repeated measures data with a scientifically defensible function that adequately summarizes both individual change as well as the average response. The module…
Descriptors: Educational Assessment, Data Analysis, Longitudinal Studies, Case Studies
Mahmoud M. S. Abdallah – Online Submission, 2025
This guide offers a comprehensive handbook to scientific research methodology and experimental design, specifically for novice MA and PhD researchers in Education and Language Learning (TESOL/TEFL). It establishes scientific research as a systematic, objective inquiry focused on identifying cause-and-effect relationships through empirical data.…
Descriptors: Scientific Research, Research Methodology, Research Design, Second Language Learning
Oranje, Andreas; Kolstad, Andrew – Journal of Educational and Behavioral Statistics, 2019
The design and psychometric methodology of the National Assessment of Educational Progress (NAEP) is constantly evolving to meet the changing interests and demands stemming from a rapidly shifting educational landscape. NAEP has been built on strong research foundations that include conducting extensive evaluations and comparisons before new…
Descriptors: National Competency Tests, Psychometrics, Statistical Analysis, Computation
Powers, Sonya; Li, Dongmei; Suh, Hongwook; Harris, Deborah J. – ACT, Inc., 2016
ACT reporting categories and ACT Readiness Ranges are new features added to the ACT score reports starting in fall 2016. For each reporting category, the number correct score, the maximum points possible, the percent correct, and the ACT Readiness Range, along with an indicator of whether the reporting category score falls within the Readiness…
Descriptors: Scores, Classification, College Entrance Examinations, Error of Measurement
Javidanmehr, Zahra; Anani Sarab, Mohammad Reza – International Journal of Language Testing, 2017
Cognitive Diagnostic Assessment (CDA) is a type of educational assessment that is designed to measure specific knowledge structures and processing skills in students so as to provide information about their cognitive strengths and weaknesses (Leighton & Gierl, 2007). CDA has been instrumental in turning the attention of practitioners to more…
Descriptors: Cognitive Tests, Diagnostic Tests, Educational Assessment, Second Language Learning
Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022
As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…
Descriptors: Scores, Scoring, Comparative Analysis, Testing