Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 18 |
Descriptor
Test Items | 22 |
Item Response Theory | 10 |
Difficulty Level | 6 |
Equated Scores | 6 |
Scores | 6 |
Statistical Analysis | 6 |
Scoring | 5 |
Computer Assisted Testing | 4 |
Error of Measurement | 4 |
Language Tests | 4 |
Models | 4 |
More ▼ |
Source
Educational Testing Service | 22 |
Author
Sinharay, Sandip | 3 |
Davey, Tim | 2 |
Dorans, Neil J. | 2 |
Haberman, Shelby J. | 2 |
Holland, Paul W. | 2 |
Livingston, Samuel A. | 2 |
Tan, Xuan | 2 |
Baron, Patricia | 1 |
Cheng, Peter C. H. | 1 |
Curley, Edward | 1 |
DeCarlo, Lawrence T. | 1 |
More ▼ |
Publication Type
Reports - Research | 11 |
Reports - Evaluative | 6 |
Information Analyses | 2 |
Numerical/Quantitative Data | 2 |
Reports - Descriptive | 2 |
Guides - Classroom - Learner | 1 |
Guides - General | 1 |
Opinion Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 3 |
Postsecondary Education | 3 |
Adult Education | 1 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
High Schools | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Practitioners | 1 |
Location
California | 2 |
Canada | 1 |
Connecticut | 1 |
Georgia | 1 |
Indiana | 1 |
Iowa | 1 |
Michigan | 1 |
Wisconsin | 1 |
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 2 |
Test of English as a Foreign… | 2 |
What Works Clearinghouse Rating
Livingston, Samuel A. – Educational Testing Service, 2020
This booklet is a conceptual introduction to item response theory (IRT), which many large-scale testing programs use for constructing and scoring their tests. Although IRT is essentially mathematical, the approach here is nonmathematical, in order to serve as an introduction on the topic for people who want to understand why IRT is used and what…
Descriptors: Item Response Theory, Scoring, Test Items, Scaling
Weeks, Jonathan; Baron, Patricia – Educational Testing Service, 2021
The current project, Exploring Math Education Relations by Analyzing Large Data Sets (EMERALDS) II, is an attempt to identify specific Common Core State Standards procedural, conceptual, and problem-solving competencies in earlier grades that best predict success in algebraic areas in later grades. The data for this study include two cohorts of…
Descriptors: Mathematics Education, Common Core State Standards, Problem Solving, Mathematics Tests
Livingston, Samuel A. – Educational Testing Service, 2014
This booklet grew out of a half-day class on equating that author Samuel Livingston teaches for new statistical staff at Educational Testing Service (ETS). The class is a nonmathematical introduction to the topic, emphasizing conceptual understanding and practical applications. The class consists of illustrated lectures, interspersed with…
Descriptors: Equated Scores, Scoring, Self Evaluation (Individuals), Scores
Grant, Mary C. – Educational Testing Service, 2011
The "single group with nearly equivalent tests" (SiGNET) design proposed here was developed to address the problem of equating scores on multiple-choice test forms with very small single-administration samples. In this design, the majority of items in each new test form consist of items from the previous form, and the new items that were…
Descriptors: Multiple Choice Tests, Equated Scores, Test Items
Xu, Xueli; Jia, Yue – Educational Testing Service, 2011
Estimation of item response model parameters and ability distribution parameters has been, and will remain, an important topic in the educational testing field. Much research has been dedicated to addressing this task. Some studies have focused on item parameter estimation when the latent ability was assumed to follow a normal distribution,…
Descriptors: Test Items, Statistical Analysis, Computation, Item Response Theory
Tan, Xuan; Xiang, Bihua; Dorans, Neil J.; Qu, Yanxuan – Educational Testing Service, 2010
The nature of the matching criterion (usually the total score) in the study of differential item functioning (DIF) has been shown to impact the accuracy of different DIF detection procedures. One of the topics related to the nature of the matching criterion is whether the studied item should be included. Although many studies exist that suggest…
Descriptors: Test Bias, Test Items, Item Response Theory
Haberman, Shelby J.; Sinharay, Sandip; Lee, Yi-Hsuan – Educational Testing Service, 2011
Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement (Carroll, 1993). Scale anchoring (Beaton & Allen, 1992), a technique that describes what students at different points on a score scale know and can do,…
Descriptors: Statistical Analysis, Scores, Regression (Statistics), Item Response Theory
Ling, Guangming; Rijmen, Frank – Educational Testing Service, 2011
The factorial structure of the Time Management (TM) scale of the Student 360: Insight Program (S360) was evaluated based on a national sample. A general procedure with a variety of methods was introduced and implemented, including the computation of descriptive statistics, exploratory factor analysis (EFA), and confirmatory factor analysis (CFA).…
Descriptors: Time Management, Measures (Individuals), Statistical Analysis, Factor Analysis
Educational Testing Service, 2011
Choosing whether to test via computer is the most difficult and consequential decision the designers of a testing program can make. The decision is difficult because of the wide range of choices available. Designers can choose where and how often the test is made available, how the test items look and function, how those items are combined into…
Descriptors: Test Items, Testing Programs, Testing, Computer Assisted Testing
Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam – Educational Testing Service, 2010
This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…
Descriptors: Equated Scores, Scoring, Responses, Test Items
DeCarlo, Lawrence T. – Educational Testing Service, 2010
A basic consideration in large-scale assessments that use constructed response (CR) items, such as essays, is how to allocate the essays to the raters that score them. Designs that are used in practice are incomplete, in that each essay is scored by only a subset of the raters, and also unbalanced, in that the number of essays scored by each rater…
Descriptors: Test Items, Responses, Essay Tests, Scoring
Dorans, Neil J. – Educational Testing Service, 2010
Santelices and Wilson (2010) claimed to have addressed technical criticisms of Freedle (2003) presented in Dorans (2004a) and elsewhere. Santelices and Wilson's abstract claimed that their study confirmed that SAT[R] verbal items do function differently for African American and White subgroups. In this commentary, I demonstrate that the…
Descriptors: College Entrance Examinations, Verbal Tests, Test Bias, Test Items
Haberman, Shelby J. – Educational Testing Service, 2010
Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…
Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy
Kim, Sooyeon; Walker, Michael E. – Educational Testing Service, 2011
This study examines the use of subpopulation invariance indices to evaluate the appropriateness of using a multiple-choice (MC) item anchor in mixed-format tests, which include both MC and constructed-response (CR) items. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using an MC-only anchor set for 4…
Descriptors: Test Format, Multiple Choice Tests, Test Items, Gender Differences
Stone, Elizabeth; Davey, Tim – Educational Testing Service, 2011
There has been an increased interest in developing computer-adaptive testing (CAT) and multistage assessments for K-12 accountability assessments. The move to adaptive testing has been met with some resistance by those in the field of special education who express concern about routing of students with divergent profiles (e.g., some students with…
Descriptors: Disabilities, Adaptive Testing, Accountability, Computer Assisted Testing
Previous Page | Next Page ยป
Pages: 1 | 2