Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 17 |
Since 2006 (last 20 years) | 54 |
Descriptor
Equated Scores | 165 |
Test Construction | 165 |
Test Items | 57 |
Item Response Theory | 41 |
Test Format | 36 |
Scaling | 34 |
Test Validity | 30 |
Scoring | 28 |
Testing Programs | 28 |
Item Analysis | 26 |
Test Reliability | 26 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 12 |
Location
New York | 5 |
Florida | 3 |
United Kingdom (England) | 3 |
United States | 2 |
Virginia | 2 |
Alabama | 1 |
Australia | 1 |
California | 1 |
Canada | 1 |
Georgia | 1 |
Germany | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Moses, Tim – Journal of Educational Measurement, 2022
One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different…
Descriptors: Measures (Individuals), Educational Assessment, Test Construction, Comparative Analysis
Practical Considerations in Choosing an Anchor Test Form for Equating under the Random Groups Design
Cui, Zhongmin; He, Yong – Measurement: Interdisciplinary Research and Perspectives, 2023
Careful considerations are necessary when there is a need to choose an anchor test form from a list of old test forms for equating under the random groups design. The choice of the anchor form potentially affects the accuracy of equated scores on new test forms. Few guidelines, however, can be found in the literature on choosing the anchor form.…
Descriptors: Test Format, Equated Scores, Best Practices, Test Construction
Jiajing Huang – ProQuest LLC, 2022
The nonequivalent-groups anchor-test (NEAT) data-collection design is commonly used in large-scale assessments. Under this design, different test groups take different test forms. Each test form has its own unique items and all test forms share a set of common items. If item response theory (IRT) models are applied to analyze the test data, the…
Descriptors: Item Response Theory, Test Format, Test Items, Test Construction
Dong, Yixiao; Clements, Douglas H.; Day-Hess, Crystal A.; Sarama, Julie; Dumas, Denis – Journal of Psychoeducational Assessment, 2021
Psychometric work with young children faces the particular challenge that children's attention spans are relatively short, and therefore, shorter assessments are required while retaining comprehensive coverage. This article reports on three empirical studies that encompass the development and validation of the research-based early mathematics…
Descriptors: Young Children, Numeracy, Test Construction, Test Validity
NWEA, 2022
This technical report documents the processes and procedures employed by NWEA® to build and support the English MAP® Reading Fluency™ assessments administered during the 2020-2021 school year. It is written for measurement professionals and administrators to help evaluate the quality of MAP Reading Fluency. The seven sections of this report: (1)…
Descriptors: Achievement Tests, Reading Tests, Reading Achievement, Reading Fluency
Koch, Marco; Spinath, Frank M.; Greiff, Samuel; Becker, Nicolas – Journal of Intelligence, 2022
Figural matrices tasks are one of the most prominent item formats used in intelligence tests, and their relevance for the assessment of cognitive abilities is unquestionable. However, despite endeavors of the open science movement to make scientific research accessible on all levels, there is a lack of royalty-free figural matrices tests. The Open…
Descriptors: Intelligence, Intelligence Tests, Computer Assisted Testing, Test Items
Sinharay, Sandip – Educational Measurement: Issues and Practice, 2018
The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested "miditests," which are anchor tests that are content-representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties.…
Descriptors: Test Content, Difficulty Level, Test Items, Test Construction
Manna, Venessa F.; Gu, Lixiong – ETS Research Report Series, 2019
When using the Rasch model, equating with a nonequivalent groups anchor test design is commonly achieved by adjustment of new form item difficulty using an additive equating constant. Using simulated 5-year data, this report compares 4 approaches to calculating the equating constants and the subsequent impact on equating results. The 4 approaches…
Descriptors: Item Response Theory, Test Items, Test Construction, Sample Size
Xiao, Yang; Koenig, Kathleen; Han, Jing; Liu, Jing; Liu, Qiaoyi; Bao, Lei – Physical Review Physics Education Research, 2019
Standardized concept inventories (CIs) have been widely used in science, technology, engineering, and mathematics education for assessment of student learning. In practice, there have been concerns regarding the length of the test and possible test-retest memory effect. To address these issues, a recent study developed a method to split a CI into…
Descriptors: Scientific Concepts, Science Tests, Energy, Magnets
Preston, Kathleen Suzanne Johnson; Gottfried, Allen W.; Park, Jonathan J.; Manapat, Patrick Don; Gottfried, Adele Eskeles; Oliver, Pamella H. – Educational and Psychological Measurement, 2018
Measurement invariance is a prerequisite when comparing different groups of individuals or when studying a group of individuals across time. This assures that the same construct is assessed without measurement artifacts. This investigation applied a novel approach of simultaneous parameter linking to cross-sectional and longitudinal measures of…
Descriptors: Longitudinal Studies, Family Relationship, Measurement, Measures (Individuals)
Supriyati, Yetti; Iriyadi, Deni; Falani, Ilham – Journal of Technology and Science Education, 2021
This study aims to develop a score equating application for computer-based school exams using parallel test kits with 25% anchor items. The items are arranged according to HOTS (High Order Thinking Skill) category, and use a scientific approach according to the physics lessons characteristics. Therefore, the questions were made using stimulus,…
Descriptors: Physics, Science Instruction, Teaching Methods, Equated Scores
Boyer, Michelle; Dadey, Nathan; Keng, Leslie – National Center for the Improvement of Educational Assessment, 2020
This school year, every state education agency (SEA) is faced with unprecedented, COVID-19-related challenges for the implementation of 2021 statewide summative assessments. Two overarching challenges are in how tests will be administered, and how scores will be interpreted and used, with many intervening and related challenges. Test…
Descriptors: State Departments of Education, Summative Evaluation, Educational Planning, Educational Strategies
Fitzpatrick, Joseph; Skorupski, William P. – Journal of Educational Measurement, 2016
The equating performance of two internal anchor test structures--miditests and minitests--is studied for four IRT equating methods using simulated data. Originally proposed by Sinharay and Holland, miditests are anchors that have the same mean difficulty as the overall test but less variance in item difficulties. Four popular IRT equating methods…
Descriptors: Difficulty Level, Test Items, Comparative Analysis, Test Construction
Lu, Ru; Haberman, Shelby; Guo, Hongwen; Liu, Jinghua – ETS Research Report Series, 2015
In this study, we apply jackknifing to anchor items to evaluate the impact of anchor selection on equating stability. In an ideal world, the choice of anchor items should have little impact on equating results. When this ideal does not correspond to reality, selection of anchor items can strongly influence equating results. This influence does not…
Descriptors: Test Construction, Equated Scores, Test Items, Sampling
Bramley, Tom – Cambridge Assessment, 2018
The aim of the research reported here was to get some idea of the accuracy of grade boundaries (cut-scores) obtained by applying the 'similar items method' described in Bramley & Wilson (2016). In this method experts identify items on the current version of a test that are sufficiently similar to items on previous versions for them to be…
Descriptors: Accuracy, Cutting Scores, Test Items, Item Analysis