Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 6 |
| Since 2017 (last 10 years) | 33 |
| Since 2007 (last 20 years) | 119 |
Descriptor
| Statistical Analysis | 311 |
| Testing | 311 |
| Test Validity | 61 |
| Test Reliability | 60 |
| Language Tests | 58 |
| Comparative Analysis | 57 |
| Scores | 53 |
| Foreign Countries | 51 |
| Test Construction | 47 |
| Test Interpretation | 41 |
| Academic Achievement | 36 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 6 |
| Teachers | 4 |
| Researchers | 1 |
| Students | 1 |
Location
| California | 6 |
| Japan | 6 |
| Germany | 5 |
| Turkey | 5 |
| Canada | 4 |
| Spain | 4 |
| Texas | 4 |
| United Kingdom | 4 |
| United Kingdom (England) | 4 |
| Israel | 3 |
| Netherlands | 3 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 3 |
| No Child Left Behind Act 2001 | 3 |
| Americans with Disabilities… | 1 |
| Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021
Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023
Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…
Descriptors: Chemistry, Periodicals, Journal Articles, Science Education
The Use of Theory of Linear Mixed-Effects Models to Detect Fraudulent Erasures at an Aggregate Level
Peng, Luyao; Sinharay, Sandip – Educational and Psychological Measurement, 2022
Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of…
Descriptors: Cheating, Identification, Statistical Analysis, Testing
Öz, Hüseyin; Özturan, Tuba – Journal of Language and Linguistic Studies, 2018
This article reports the findings of a study that sought to investigate whether computer-based vs. paper-based test-delivery mode has an impact on the reliability and validity of an achievement test for a pedagogical content knowledge course in an English teacher education program. A total of 97 university students enrolled in the English as a…
Descriptors: Computer Assisted Testing, Testing, Test Format, Teaching Methods
Bayazidi, Aso; Saeb, Fateme – Advances in Language and Literary Studies, 2017
This study examined the equivalence and reliability of the two versions of the Vocabulary Levels Test in an Iranian context. This study was motivated by the fact that the Vocabulary Levels test is increasingly being used in Iran for both research and pedagogical purposes without having been checked for validity and reliability in this context. The…
Descriptors: Foreign Countries, Vocabulary, English (Second Language), College Second Language Programs
Powers, Sonya; Li, Dongmei; Suh, Hongwook; Harris, Deborah J. – ACT, Inc., 2016
ACT reporting categories and ACT Readiness Ranges are new features added to the ACT score reports starting in fall 2016. For each reporting category, the number correct score, the maximum points possible, the percent correct, and the ACT Readiness Range, along with an indicator of whether the reporting category score falls within the Readiness…
Descriptors: Scores, Classification, College Entrance Examinations, Error of Measurement
Kong, Xiaojing; Davis, Laurie Laughlin; McBride, Yuanyuan; Morrison, Kristin – Applied Measurement in Education, 2018
Item response time data were used in investigating the differences in student test-taking behavior between two device conditions: computer and tablet. Analyses were conducted to address the questions of whether or not the device condition had a differential impact on rapid guessing and solution behaviors (with response time effort used as an…
Descriptors: Educational Technology, Technology Uses in Education, Computers, Handheld Devices
Bendulo, Hermabeth O.; Tibus, Erlinda D.; Bande, Rhodora A.; Oyzon, Voltaire Q.; Milla, Norberto E.; Macalinao, Myrna L. – International Journal of Evaluation and Research in Education, 2017
Testing or evaluation in an educational context is primarily used to measure or evaluate and authenticate the academic readiness, learning advancement, acquisition of skills, or instructional needs of learners. This study tried to determine whether the varied combinations of arrangements of options and letter cases in a Multiple-Choice Test (MCT)…
Descriptors: Test Format, Multiple Choice Tests, Test Construction, Eye Movements
Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022
As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…
Descriptors: Scores, Scoring, Comparative Analysis, Testing
Ozsoy, Seyma Nur; Kilmen, Sevilay – International Journal of Assessment Tools in Education, 2023
In this study, Kernel test equating methods were compared under NEAT and NEC designs. In NEAT design, Kernel post-stratification and chain equating methods taking into account optimal and large bandwidths were compared. In the NEC design, gender and/or computer/tablet use was considered as a covariate, and Kernel test equating methods were…
Descriptors: Equated Scores, Testing, Test Items, Statistical Analysis
Oliveri, María Elena; von Davier, Alina A. – International Journal of Testing, 2016
In this study, we propose that the unique needs and characteristics of linguistic minorities should be considered throughout the test development process. Unlike most measurement invariance investigations in the assessment of linguistic minorities, which typically are conducted after test administration, we propose strategies that focus on the…
Descriptors: Psychometrics, Linguistics, Test Construction, Testing
Mouritsen, Matthew L.; Davis, Jefferson T.; Jones, Steven C. – Journal of Learning in Higher Education, 2016
Instructors are often concerned when giving multiple-day tests because students taking the test later in the exam period may have an advantage over students taking the test early in the exam period due to information leakage. However, exam scores seemed to decline as students took the same test later in a multi-day exam period (Mouritsen and…
Descriptors: Statistical Analysis, Scores, Tests, Testing
Lozano, José H.; Revuelta, Javier – Applied Measurement in Education, 2021
The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework…
Descriptors: Bayesian Statistics, Computation, Learning, Testing
Tempel, Tobias; Neumann, Roland – Journal of Experimental Education, 2016
We investigated processes underlying performance decrements of highly test-anxious persons. Three experiments contrasted conditions that differed in the degree of activation of concepts related to failure. Participants memorized a list of words either containing words related to failure or containing no words related to failure in Experiment 1. In…
Descriptors: Test Anxiety, Cognitive Tests, Test Wiseness, Foreign Countries
Pan, Steven C.; Gopal, Arpita; Rickard, Timothy C. – Journal of Educational Psychology, 2016
Does correctly answering a test question about a multiterm fact enhance memory for the entire fact? We explored that issue in 4 experiments. Subjects first studied Advanced Placement History or Biology facts. Half of those facts were then restudied, whereas the remainder were tested using "5 W" (i.e., "who, what, when, where",…
Descriptors: Undergraduate Students, Testing, Test Items, Memory

Peer reviewed
Direct link
