Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 15 |
Descriptor
Test Construction | 109 |
Test Items | 109 |
Test Use | 109 |
Test Validity | 38 |
Scoring | 27 |
Test Reliability | 26 |
Achievement Tests | 24 |
Elementary Secondary Education | 24 |
Higher Education | 21 |
Foreign Countries | 20 |
Psychometrics | 17 |
More ▼ |
Source
Author
Publication Type
Education Level
Elementary Education | 4 |
Elementary Secondary Education | 4 |
Higher Education | 4 |
Postsecondary Education | 4 |
Secondary Education | 4 |
High Schools | 2 |
Middle Schools | 2 |
Early Childhood Education | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Grade 5 | 1 |
More ▼ |
Audience
Practitioners | 24 |
Teachers | 12 |
Administrators | 6 |
Researchers | 4 |
Students | 2 |
Counselors | 1 |
Parents | 1 |
Policymakers | 1 |
Location
New Jersey | 3 |
Pennsylvania | 3 |
Australia | 2 |
Canada | 2 |
Georgia | 2 |
Minnesota | 2 |
Oregon | 2 |
Tennessee | 2 |
Alabama | 1 |
Alaska | 1 |
Colorado | 1 |
More ▼ |
Laws, Policies, & Programs
Comprehensive Education… | 2 |
National Defense Education Act | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Tatiana Chaiban; Zeinab Nahle; Ghaith Assi; Michelle Cherfane – Discover Education, 2024
Background: Since it was first launched, ChatGPT, a Large Language Model (LLM), has been widely used across different disciplines, particularly the medical field. Objective: The main aim of this review is to thoroughly assess the performance of the distinct version of ChatGPT in subspecialty written medical proficiency exams and the factors that…
Descriptors: Medical Education, Accuracy, Artificial Intelligence, Computer Software
Jessica B. Koslouski; Sandra M. Chafouleas; Amy Briesch; Jacqueline M. Caemmerer; Brittany Melo – School Mental Health, 2024
We are developing the Equitable Screening to Support Youth (ESSY) Whole Child Screener to address concerns prevalent in existing school-based screenings that impede goals to advance educational equity using universal screeners. Traditional assessment development does not include end users in the early development phases, instead relying on a…
Descriptors: Screening Tests, Psychometrics, Validity, Child Development
Jessica B. Koslouski; Sandra M. Chafouleas; Amy Briesch; Jacqueline M. Caemmerer; Brittany Melo – Grantee Submission, 2024
We are developing the Equitable Screening to Support Youth (ESSY) Whole Child Screener to address concerns prevalent in existing school-based screenings that impede goals to advance educational equity using universal screeners. Traditional assessment development does not include end users in the early development phases, instead relying on a…
Descriptors: Screening Tests, Usability, Decision Making, Validity
Sinharay, Sandip – Educational Measurement: Issues and Practice, 2018
The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested "miditests," which are anchor tests that are content-representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties.…
Descriptors: Test Content, Difficulty Level, Test Items, Test Construction
College Board, 2023
Over the past several years, content experts, psychometricians, and researchers have been hard at work developing, refining, and studying the digital SAT. The work is grounded in foundational best practices and advances in measurement and assessment design, with fairness for students informing all of the work done. This paper shares learnings from…
Descriptors: College Entrance Examinations, Psychometrics, Computer Assisted Testing, Best Practices
Lehane, Paula; Scully, Darina; O'Leary, Michael – Irish Educational Studies, 2022
In line with the widespread proliferation of digital technology in everyday life, many countries are now beginning to use computer-based exams (CBEs) in their post-primary education systems. To ensure that these CBEs are delivered in a manner that preserves their fairness, validity, utility and credibility, several factors pertaining to their…
Descriptors: Computer Assisted Testing, Secondary School Students, Culture Fair Tests, Test Validity
New Meridian Corporation, 2020
New Meridian Corporation has developed the "Quality Testing Standards and Criteria for Comparability Claims" (QTS) to provide guidance to states that are interested in including New Meridian content and would like to either keep reporting scores on the New Meridian Scale or use the New Meridian performance levels; that is, the state…
Descriptors: Testing, Standards, Comparative Analysis, Test Content
McClellan, Catherine; Snyder, Rebecca; Woods-Murphy, Maryann; Basset, Katherine – National Network of State Teachers of the Year, 2018
Great teachers recognize great assessments. As policy and education leaders work to make sure state tests are measuring the problem-solving, writing, and critical-thinking skills students need for success, they should convene and rely on teachers to review test quality and help answer the question: Do the questions on our state test reflect…
Descriptors: Student Evaluation, Educational Quality, Standardized Tests, Test Items
Ayar, Zülal – Novitas-ROYAL (Research on Youth and Language), 2021
As the most prestigious and popular standardized achievement test to certify examinees' proficiency of the English language at the national level, Foreign Language Examination (YDS) has been mostly taken by academic staff, undergraduate and graduate students, state employees, and military personnel for years in Turkey. The current study set out to…
Descriptors: Second Language Learning, Second Language Instruction, Language Tests, Language Proficiency
Romine, William L.; Schaffer, Dane L.; Barrow, Lloyd – International Journal of Science Education, 2015
We describe the development and validation of a three-tiered diagnostic test of the water cycle (DTWC) and use it to evaluate the impact of prior learning experiences on undergraduates' misconceptions. While most approaches to instrument validation take a positivist perspective using singular criteria such as reliability and fit with a measurement…
Descriptors: Undergraduate Students, Diagnostic Tests, Water, Item Response Theory
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Traynor, Anne – Educational Assessment, 2017
Variation in test performance among examinees from different regions or national jurisdictions is often partially attributed to differences in the degree of content correspondence between local school or training program curricula, and the test of interest. This posited relationship between test-curriculum correspondence, or "alignment,"…
Descriptors: Test Items, Test Construction, Alignment (Education), Curriculum
Jia, Yujie – ProQuest LLC, 2013
This study employed Bachman and Palmer's (2010) Assessment Use Argument framework to investigate to what extent the use of a second language oral test as an exit test in a Hong Kong university can be justified. It also aimed to help test developers of this oral test identify the most critical areas in the current test design that might need…
Descriptors: Test Use, Language Tests, Oral Language, Second Language Learning
Herman, Joan L.; Osmundson, Ellen; Dietel, Ronald – Assessment and Accountability Comprehensive Center, 2010
This report describes the purposes of benchmark assessments and provides recommendations for selecting and using benchmark assessments--addressing validity, alignment, reliability, fairness and bias and accessibility, instructional sensitivity, utility, and reporting issues. We also present recommendations on building capacity to support schools'…
Descriptors: Multiple Choice Tests, Test Items, Benchmarking, Educational Assessment

DeConinck, James B.; And Others – Educational and Psychological Measurement, 1996
Using a multidimensional measure of pay satisfaction, the Pay Satisfaction Questionnaire (PSQ), this study assessed the discriminant validity between scores on a measure of distributive justice and the PSQ with 474 employees. Confirmatory factor analysis results indicate that items from both scales loaded on the hypothesized dimensions. (SLD)
Descriptors: Construct Validity, Employees, Salaries, Satisfaction