Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 60 |
Descriptor
Source
Author
Buckendahl, Chad W. | 3 |
Dorans, Neil J. | 3 |
Ferrara, Steve | 3 |
Guo, Hongwen | 3 |
Wyse, Adam E. | 3 |
von Davier, Alina A. | 3 |
Creagh, Sue | 2 |
Egley, Robert J. | 2 |
Haberman, Shelby | 2 |
Haberman, Shelby J. | 2 |
Huynh, Huynh | 2 |
More ▼ |
Publication Type
Journal Articles | 119 |
Reports - Research | 51 |
Reports - Evaluative | 44 |
Reports - Descriptive | 20 |
Information Analyses | 4 |
Opinion Papers | 3 |
Speeches/Meeting Papers | 3 |
Guides - Non-Classroom | 2 |
Tests/Questionnaires | 1 |
Education Level
Elementary Secondary Education | 15 |
Higher Education | 14 |
Elementary Education | 8 |
Postsecondary Education | 6 |
Grade 3 | 4 |
Secondary Education | 4 |
Grade 4 | 3 |
Grade 5 | 3 |
High Schools | 3 |
Adult Education | 2 |
Grade 6 | 2 |
More ▼ |
Audience
Practitioners | 3 |
Administrators | 2 |
Teachers | 1 |
Location
Australia | 6 |
Canada | 3 |
Florida | 3 |
Greece | 2 |
Netherlands | 2 |
New York | 2 |
Texas | 2 |
Arizona | 1 |
Azerbaijan | 1 |
China (Shanghai) | 1 |
Colorado | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 6 |
Americans with Disabilities… | 1 |
Individuals with Disabilities… | 1 |
Rehabilitation Act 1973… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Klugman, Emma M.; Ho, Andrew D. – Educational Measurement: Issues and Practice, 2020
State testing programs regularly release previously administered test items to the public. We provide an open-source recipe for state, district, and school assessment coordinators to combine these items flexibly to produce scores linked to established state score scales. These would enable estimation of student score distributions and achievement…
Descriptors: Testing Programs, State Programs, Test Items, Scores
Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2015
An equating procedure for a testing program with evolving distribution of examinee profiles is developed. No anchor is available because the original scoring scheme was based on expert judgment of the item difficulties. Pairs of examinees from two administrations are formed by matching on coarsened propensity scores derived from a set of…
Descriptors: Equated Scores, Testing Programs, College Entrance Examinations, Scoring
Haberman, Shelby J. – ETS Research Report Series, 2020
Best linear prediction (BLP) and penalized best linear prediction (PBLP) are techniques for combining sources of information to produce task scores, section scores, and composite test scores. The report examines issues to consider in operational implementation of BLP and PBLP in testing programs administered by ETS [Educational Testing Service].
Descriptors: Prediction, Scores, Tests, Testing Programs
Keller, Lisa A.; Keller, Robert; Cook, Robert J.; Colvin, Kimberly F. – Applied Measurement in Education, 2016
The equating of tests is an essential process in high-stakes, large-scale testing conducted over multiple forms or administrations. By adjusting for differences in difficulty and placing scores from different administrations of a test on a common scale, equating allows scores from these different forms and administrations to be directly compared…
Descriptors: Item Response Theory, Equated Scores, Test Format, Testing Programs
Livingston, Samuel A. – ETS Research Report Series, 2014
In this study, I investigated 2 procedures intended to create test-taker groups of equal ability by poststratifying on a composite variable created from demographic information. In one procedure, the stratifying variable was the composite variable that best predicted the test score. In the other procedure, the stratifying variable was the…
Descriptors: Demography, Equated Scores, Cluster Grouping, Ability Grouping
Cronin, John; Jensen, Nate – Phi Delta Kappan, 2014
When New York state released the first results of the exams under the Common Core State Standards, many wrongly believed that the results showed dramatic declines in student achievement. A closer look at the results showed that student achievement may have increased. Another lesson from the exams is that states need to closely coordinate new data…
Descriptors: Academic Achievement, State Standards, Core Curriculum, Achievement Gains
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
LaFlair, Geoffrey T.; Isbell, Daniel; May, L. D. Nicolas; Gutierrez Arvizu, Maria Nelly; Jamieson, Joan – Language Testing, 2017
Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by…
Descriptors: Language Tests, Equated Scores, Testing Programs, Comparative Analysis
Davis-Becker, Susan L.; Buckendahl, Chad W. – International Journal of Testing, 2013
A critical component of the standard setting process is collecting evidence to evaluate the recommended cut scores and their use for making decisions and classifying students based on test performance. Kane (1994, 2001) proposed a framework by which practitioners can identify and evaluate evidence of the results of the standard setting from (1)…
Descriptors: Standard Setting (Scoring), Evidence, Validity, Cutting Scores
von Davier, Alina A. – ETS Research Report Series, 2012
Maintaining comparability of test scores is a major challenge faced by testing programs that have almost continuous administrations. Among the potential problems are scale drift and rapid accumulation of errors. Many standard quality control techniques for testing programs, which can effectively detect and address scale drift for small numbers of…
Descriptors: Quality Control, Data Analysis, Trend Analysis, Scaling
Debeer, Dries; Buchholz, Janine; Hartig, Johannes; Janssen, Rianne – Journal of Educational and Behavioral Statistics, 2014
In this article, the change in examinee effort during an assessment, which we will refer to as persistence, is modeled as an effect of item position. A multilevel extension is proposed to analyze hierarchically structured data and decompose the individual differences in persistence. Data from the 2009 Program of International Student Achievement…
Descriptors: Reading Tests, International Programs, Testing Programs, Individual Differences
Creagh, Sue – English Teaching: Practice and Critique, 2014
The Australian field of English as a Second Language (ESL) teaching is globally respected for its research and practice achievements over a period of some 30 years. However, this essential field of pedagogy is being diluted in the current Australian reform agenda which is firmly founded on a traditional vision of English as first language, and…
Descriptors: Foreign Countries, Standardized Tests, English (Second Language), Second Language Learning
Tingting, Xu; Hua, Ma; Xiujuan, Wang; Jing, Wang – Higher Education Studies, 2015
The traditional JAVA course examination is just a list of questions from which we cannot know students' skills of programming. According to the eight abilities in curriculum objectives, we designed an assessment standard of JAVA programming course that is based on employment orientation and apply it to practical teaching to check the teaching…
Descriptors: Programming Languages, Programming, Behavioral Objectives, Labor Needs
Hardy, Ian – Journal of Education Policy, 2014
This paper explores how the strong policy push to improve students' results on national literacy and numeracy tests -- the National Assessment Program, Literacy and Numeracy (NAPLAN) -- in the Australian state of Queensland influenced schooling practices, including teachers' learning. The paper argues the focus upon improved test scores on NAPLAN…
Descriptors: Literacy, Numeracy, Foreign Countries, Standardized Tests
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation