Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 4 |
Descriptor
Source
| Journal of Educational… | 33 |
Author
Publication Type
| Journal Articles | 32 |
| Reports - Research | 21 |
| Reports - Evaluative | 10 |
| Reports - Descriptive | 1 |
| Speeches/Meeting Papers | 1 |
Education Level
| Elementary Education | 1 |
| Elementary Secondary Education | 1 |
| Grade 7 | 1 |
| Grade 8 | 1 |
| Higher Education | 1 |
| Postsecondary Education | 1 |
Audience
Location
| Ireland | 2 |
| Israel | 1 |
| United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| SAT (College Admission Test) | 5 |
| Graduate Record Examinations | 4 |
| Advanced Placement… | 1 |
| Iowa Tests of Basic Skills | 1 |
| Test of Standard Written… | 1 |
What Works Clearinghouse Rating
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Peer reviewedTrentham, Landa L. – Journal of Educational Measurement, 1975
Descriptors: Comparative Testing, Educational Testing, Elementary Education, Grade 6
Sinharay, Sandip; Holland, Paul W. – Journal of Educational Measurement, 2007
It is a widely held belief that anchor tests should be miniature versions (i.e., "minitests"), with respect to content and statistical characteristics, of the tests being equated. This article examines the foundations for this belief regarding statistical characteristics. It examines the requirement of statistical representativeness of…
Descriptors: Test Items, Comparative Testing
Meijer, Rob R. – Journal of Educational Measurement, 2004
Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…
Descriptors: Probability, Adaptive Testing, Item Response Theory, Scores
Peer reviewedWainer, Howard; And Others – Journal of Educational Measurement, 1992
Computer simulations were run to measure the relationship between testlet validity and factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution. (Author/SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation
Chen, Shu-Ying; Ankenman, Robert D. – Journal of Educational Measurement, 2004
The purpose of this study was to compare the effects of four item selection rules--(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)--with respect to the precision of trait estimation and the…
Descriptors: Test Length, Adaptive Testing, Computer Assisted Testing, Test Selection
Peer reviewedWise, Steven L.; And Others – Journal of Educational Measurement, 1992
Performance of 156 undergraduate and 48 graduate students on a self-adapted test (SFAT)--students choose the difficulty level of their test items--was compared with performance on a computer-adapted test (CAT). Those taking the SFAT obtained higher ability scores and reported lower posttest state anxiety than did CAT takers. (SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Difficulty Level
Peer reviewedGitomer, Drew H.; Yamamoto, Kentaro – Journal of Educational Measurement, 1991
A model integrating latent trait and latent class theories in characterizing individual performance on the basis of qualitative understanding is presented. This HYBRID model is illustrated through experiments with 119 Air Force technicians taking a paper-and-pencil test and 136 Air Force technicians taking a computerized test. (SLD)
Descriptors: Comparative Testing, Computer Assisted Testing, Educational Assessment, Item Response Theory
Peer reviewedBridgeman, Brent; Rock, Donald A. – Journal of Educational Measurement, 1993
Exploratory and confirmatory factor analyses were used to explore relationships among existing item types and three new computer-administered item types for the analytical scale of the Graduate Record Examination General Test. Results with 349 students indicate constructs the item types are measuring. (SLD)
Descriptors: College Entrance Examinations, College Students, Comparative Testing, Computer Assisted Testing
Monahan, Patrick O.; Lee, Won-Chan; Ankenmann, Robert D. – Journal of Educational Measurement, 2007
A Monte Carlo simulation technique for generating dichotomous item scores is presented that implements (a) a psychometric model with different explicit assumptions than traditional parametric item response theory (IRT) models, and (b) item characteristic curves without restrictive assumptions concerning mathematical form. The four-parameter beta…
Descriptors: True Scores, Psychometrics, Monte Carlo Methods, Correlation
Peer reviewedWainer, Howard; And Others – Journal of Educational Measurement, 1991
Hierarchical (adaptive) and linear methods of testlet construction were compared. The performance of 2,080 ninth and tenth graders on a 4-item testlet was used to predict performance on the entire test. The adaptive test was slightly superior as a predictor, but the cost of obtaining that superiority was considerable. (SLD)
Descriptors: Adaptive Testing, Algebra, Comparative Testing, High School Students
Peer reviewedStricker, Lawrence J. – Journal of Educational Measurement, 1991
To study whether different forms of the Scholastic Aptitude Test (SAT) used since the mid-1970s varied in their correlations with academic performance criteria, 1975 and 1985 forms were administered to 1,554 and 1,753 high school juniors, respectively. The 1975 form did not have greater validity than the 1985 form. (SLD)
Descriptors: Class Rank, College Entrance Examinations, Comparative Testing, Correlation
Peer reviewedBridgeman, Brent – Journal of Educational Measurement, 1992
Examinees in a regular administration of the quantitative portion of the Graduate Record Examination responded to particular items in a machine-scannable multiple-choice format. Volunteers (n=364) used a computer to answer open-ended counterparts of these items. Scores for both formats demonstrated similar correlational patterns. (SLD)
Descriptors: Answer Sheets, College Entrance Examinations, College Students, Comparative Testing
Peer reviewedMartinez, Michael E. – Journal of Educational Measurement, 1991
Figural response items (FRIs) in science were administered to 347 fourth graders, 365 eighth graders, and 322 twelfth graders. Item and test statistics from parallel FRIs and multiple-choice questions illustrate FRIs' more difficult and more discriminating nature. Relevance of guessing to FRIs and diagnostic value of the item type are highlighted.…
Descriptors: Comparative Testing, Constructed Response, Elementary School Students, Elementary Secondary Education

Direct link
