ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	5

Descriptor

Comparative Testing	28
Test Format	28
Test Reliability	28
Higher Education	12
Multiple Choice Tests	12
Test Items	12
Test Construction	11
Test Validity	9
Computer Assisted Testing	6
Objective Tests	6
Difficulty Level	5
Foreign Countries	5
High School Students	4
High Schools	4
Psychometrics	4
Comparative Analysis	3
Geography	3
Item Analysis	3
Response Style (Tests)	3
Scores	3
Scoring	3
Undergraduate Students	3
Ability	2
Achievement Tests	2
Adaptive Testing	2
More ▼

Source

Educational and Psychological…	2
Psychological Assessment	2
Advances in Health Sciences…	1
Alberta Journal of…	1
Applied Psychological…	1
Behavior Research Methods,…	1
Educational Research and…	1
Evaluation and the Health…	1
Geographical Education	1
Journal of Applied Testing…	1
Journal of Educational and…	1
Journal of Experimental…	1
Journal of Geography in…	1
Measurement and Evaluation in…	1
More ▼

Publication Type

Reports - Research	26
Journal Articles	16
Speeches/Meeting Papers	11
Reports - Evaluative	2
Opinion Papers	1
Tests/Questionnaires	1

Education Level

Secondary Education	2
Elementary Secondary Education	1
High Schools	1
Higher Education	1
Postsecondary Education	1

Audience

Researchers

Location

United Kingdom	2
China	1
Ireland	1
Maryland	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Embedded Figures Test	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

A Two-Level Adaptive Test Battery

Peer reviewed

Direct link

Wim J. van der Linden; Luping Niu; Seung W. Choi – Journal of Educational and Behavioral Statistics, 2024

A test battery with two different levels of adaptation is presented: a within-subtest level for the selection of the items in the subtests and a between-subtest level to move from one subtest to the next. The battery runs on a two-level model consisting of a regular response model for each of the subtests extended with a second level for the joint…

Descriptors: Adaptive Testing, Test Construction, Test Format, Test Reliability

Multiple True-False Items: A Comparison of Scoring Algorithms

Peer reviewed

Direct link

Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018

Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…

Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests

The Impact of National Examinations on Geography Teachers' Assessment Practices in the Netherlands

Peer reviewed
PDF on ERIC

Download full text

Bijsterbosch, Erik – Geographical Education, 2018

Geography teachers' school-based (internal) examinations in pre-vocational geography education in the Netherlands appear to be in line with the findings in the literature, namely that teachers' assessment practices tend to focus on the recall of knowledge. These practices are strongly influenced by national (external) examinations. This paper…

Descriptors: Foreign Countries, Instructional Effectiveness, National Competency Tests, Geography Instruction

Online and Paper Evaluations of Courses: A Literature Review and Case Study

Peer reviewed

Direct link

Morrison, Keith – Educational Research and Evaluation, 2013

This paper reviews the literature on comparing online and paper course evaluations in higher education and provides a case study of a very large randomised trial on the topic. It presents a mixed but generally optimistic picture of online course evaluations with respect to response rates, what they indicate, and how to increase them. The paper…

Descriptors: Literature Reviews, Course Evaluation, Case Studies, Higher Education

The Contribution of Constructed Response Items to Large Scale Assessment: Measuring and Understanding Their Impact

Peer reviewed

Direct link

Lissitz, Robert W.; Hou, Xiaodong; Slater, Sharon Cadman – Journal of Applied Testing Technology, 2012

This article investigates several questions regarding the impact of different item formats on measurement characteristics. Constructed response (CR) items and multiple choice (MC) items obviously differ in their formats and in the resources needed to score them. As such, they have been the subject of considerable discussion regarding the impact of…

Descriptors: Computer Assisted Testing, Scoring, Evaluation Problems, Psychometrics

Comparing Paper-Pencil and Computer-Based Versions of the Harrington-O'Shea Career Decision-Making System.

Peer reviewed

Kapes, Jerome T.; Vansickle, Timothy R. – Measurement and Evaluation in Counseling and Development, 1992

Examined equivalence of mode of administration of the Career Decision-Making System, comparing paper-and-pencil version and computer-based version. Findings from 61 undergraduate students indicated that the computer-based version was significantly more reliable than paper-and-pencil version and was generally equivalent in other respects.…

Descriptors: Comparative Testing, Computer Assisted Testing, Higher Education, Test Format

Some Advantages of Alternate-Choice Test Items.

Ebel, Robert L. – 1981

An alternate-choice test item is a simple declarative sentence, one portion of which is given with two different wordings. For example, "Foundations like Ford and Carnegie tend to be (1) eager (2) hesitant to support innovative solutions to educational problems." The examinee's task is to choose the alternative that makes the sentence…

Descriptors: Comparative Testing, Difficulty Level, Guessing (Tests), Multiple Choice Tests

Comparability, Reliability, and Practice Effects on Alternate Forms of the Digit Symbol Substitution and Symbol Digit Modalities Tests

Peer reviewed

Direct link

Hinton-Bayre, Anton; Geffen, Gina – Psychological Assessment, 2005

The present study examined the comparability of 4 alternate forms of the Digit Symbol Substitution test and the Symbol Digit Modalities (written) test, including the original versions. Male contact-sport athletes (N=112) were assessed on 1 of the 4 forms of each test. Reasonable alternate form comparability was demonstrated through establishing…

Descriptors: Intervals, Test Format, Orthographic Symbols, Drills (Practice)

Multiple Choice and True-False: Reliability and Validity Compared.

Peer reviewed

Green, Kathy – Journal of Experimental Education, 1979

Reliabilities and concurrent validities of teacher-made multiple-choice and true-false tests were compared. No significant differences were found even when multiple-choice reliability was adjusted to equate testing time. (Author/MH)

Descriptors: Comparative Testing, Higher Education, Multiple Choice Tests, Test Format

Measuring Recognition Performance Using Computer-Based and Paper-Based Methods.

Peer reviewed

Federico, Pat-Anthony – Behavior Research Methods, Instruments, and Computers, 1991

Using a within-subjects design, computer-based and paper-based tests of aircraft silhouette recognition were administered to 83 male naval pilots and flight officers to determine the relative reliabilities and validities of 2 measurement modes. Relative reliabilities and validities of the two modes were contingent on the multivariate measurement…

Descriptors: Aircraft Pilots, Comparative Testing, Computer Assisted Testing, Males

Development and Validation of a Computer-Administered Version of the Hamilton Anxiety Scale.

Peer reviewed

Kobak, Kenneth A.; And Others – Psychological Assessment, 1993

A developed computer-administered form of the Hamilton Anxiety Scale and the clinician form of the instrument were administered to 214 psychiatric outpatients and 78 community adults. Results support the reliability and validity of the computer-administered version as an alternative to the clinician-administered version. (SLD)

Descriptors: Adults, Anxiety, Clinical Diagnosis, Comparative Testing

The Effect of Homogeneous vs. Heterogeneous Matching-Item Format on Test Performance and Reliability.

Peer reviewed

Allison, Donald E. – Alberta Journal of Educational Research, 1984

Reports that no significant difference in reliability appeared between a heterogeneous and a homogeneous form of the same general science matching-item test administered to 316 sixth-grade students but that scores on the heterogeneous form of the test were higher, independent of the examinee's sex or intelligence. (SB)

Descriptors: Comparative Analysis, Comparative Testing, Elementary Education, Grade 6

Equivalence Reliability of the Split-Half WISC-R Object Assembly Subtest in a Cohort of Juvenile Offenders.

Download full text

Rodriguez-Aragon, Graciela; And Others – 1993

The predictive power of the Split-Half version of the Wechsler Intelligence Scale for Children--Revised (WISC-R) Object Assembly (OA) subtest was compared to that of the full administration of the OA subtest. A cohort of 218 male and 49 female adolescent offenders detained in a Texas juvenile detention facility between 1990 and 1992 was used. The…

Descriptors: Adolescents, Cohort Analysis, Comparative Testing, Correlation

Scaling Behavioral Anchors.

Peer reviewed

Barnes, Janet L.; Landy, Frank J. – Applied Psychological Measurement, 1979

Although behaviorally anchored rating scales have both intuitive and empirical appeal, they have not always yielded superior results in contrast with graphic rating scales. Results indicate that the choice of an anchoring procedure will depend on the nature of the actual rating process. (Author/JKS)

Descriptors: Behavior Rating Scales, Comparative Testing, Higher Education, Rating Scales

The Effect of Negation and Polar Opposite Item Reversals on Questionnaire Reliability and Validity: An Experimental Investigation.

Peer reviewed

Schriesheim, Chester A.; And Others – Educational and Psychological Measurement, 1991

Effects of item wording on questionnaire reliability and validity were studied, using 280 undergraduate business students who completed a questionnaire comprising 4 item types: (1) regular; (2) polar opposite; (3) negated polar opposite; and (4) negated regular. Implications of results favoring regular and negated regular items are discussed. (SLD)

Descriptors: Business Education, Comparative Testing, Higher Education, Negative Forms (Language)

Previous Page | Next Page »

Pages: 1 | 2

Trevisan, Michael S.	2
Allison, Donald E.	1
Anderson, Paul S.	1
Barnes, Janet L.	1
Bauer, Daniel	1
Bethscheider, Janine K.	1
Bijsterbosch, Erik	1
Chang, Lei	1
Chissom, Brad	1
Chukabarah, Prince C. O.	1
Ebel, Robert L.	1
Federico, Pat-Anthony	1
Fischer, Martin R.	1
Geffen, Gina	1
Goldstein, Harvey	1
Green, Kathy	1
Guttormsen, Sissel	1
Harasym, P. H.	1
Hinton-Bayre, Anton	1
Hou, Xiaodong	1
Huwendiek, Sören	1
Hyers, Albert D.	1
Jones, Allan	1
Kapes, Jerome T.	1
More ▼