Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 9 |
Descriptor
Test Format | 41 |
Test Interpretation | 41 |
Test Reliability | 19 |
Higher Education | 14 |
Test Construction | 12 |
Scoring | 11 |
Test Items | 10 |
Test Validity | 9 |
College Freshmen | 8 |
Student Evaluation | 7 |
Test Norms | 7 |
More ▼ |
Source
Author
White, Edward M. | 6 |
Anderson, Colette | 1 |
Ansorge, Charles J. | 1 |
Bell, Courtney | 1 |
Ben Seipel | 1 |
Benavidez, Charlotte | 1 |
Benjamin, Roger | 1 |
Benson, Jeri | 1 |
Bensoussan, Marsha | 1 |
Best, Rachel | 1 |
Bethscheider, Janine K. | 1 |
More ▼ |
Publication Type
Reports - Research | 41 |
Journal Articles | 15 |
Speeches/Meeting Papers | 9 |
Reports - Descriptive | 6 |
Guides - Non-Classroom | 3 |
Numerical/Quantitative Data | 2 |
Tests/Questionnaires | 2 |
Information Analyses | 1 |
Education Level
Higher Education | 3 |
Postsecondary Education | 2 |
Audience
Administrators | 3 |
Practitioners | 3 |
Researchers | 3 |
Teachers | 3 |
Location
California | 6 |
Hong Kong | 1 |
Italy | 1 |
Japan | 1 |
New York (New York) | 1 |
United Kingdom (Great Britain) | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Joseph, Dane Christian – Journal of Effective Teaching in Higher Education, 2019
Multiple-choice testing is a staple within the U.S. higher education system. From classroom assessments to standardized entrance exams such as the GRE, GMAT, or LSAT, test developers utilize a variety of validated and heuristic driven item-writing guidelines. One such guideline that has been given recent attention is to randomize the position of…
Descriptors: Test Construction, Multiple Choice Tests, Guessing (Tests), Test Wiseness
Ben Seipel; Sarah E. Carlson; Virginia Clinton-Lisell; Mark L. Davison; Patrick C. Kennedy – Grantee Submission, 2022
Originally designed for students in Grades 3 through 5, MOCCA (formerly the Multiple-choice Online Causal Comprehension Assessment), identifies students who struggle with comprehension, and helps uncover why they struggle. There are many reasons why students might not comprehend what they read. They may struggle with decoding, or reading words…
Descriptors: Multiple Choice Tests, Computer Assisted Testing, Diagnostic Tests, Reading Tests
Sinharay, Sandip – Grantee Submission, 2018
Tatsuoka (1984) suggested several extended caution indices and their standardized versions that have been used as person-fit statistics by researchers such as Drasgow, Levine, and McLaughlin (1987), Glas and Meijer (2003), and Molenaar and Hoijtink (1990). However, these indices are only defined for tests with dichotomous items. This paper extends…
Descriptors: Test Format, Goodness of Fit, Item Response Theory, Error Patterns
Moses, Tim – ETS Research Report Series, 2013
The purpose of this report is to review ETS psychometric contributions that focus on test scores. Two major sections review contributions based on assessing test scores' measurement characteristics and other contributions about using test scores as predictors in correlational and regression relationships. An additional section reviews additional…
Descriptors: Psychometrics, Scores, Correlation, Regression (Statistics)
Jordan, Sally – Computers & Education, 2012
Students were observed directly, in a usability laboratory, and indirectly, by means of an extensive evaluation of responses, as they attempted interactive computer-marked assessment questions that required free-text responses of up to 20 words and as they amended their responses after receiving feedback. This provided more general insight into…
Descriptors: Learner Engagement, Feedback (Response), Evaluation, Test Interpretation
Lee, Eunjung; Lee, Won-Chan; Brennan, Robert L. – College Board, 2012
In almost all high-stakes testing programs, test equating is necessary to ensure that test scores across multiple test administrations are equivalent and can be used interchangeably. Test equating becomes even more challenging in mixed-format tests, such as Advanced Placement Program® (AP®) Exams, that contain both multiple-choice and constructed…
Descriptors: Test Construction, Test Interpretation, Test Norms, Test Reliability
O'Reilly, Tenaha; Sabatini, John – ETS Research Report Series, 2013
This paper represents the third installment of the Reading for Understanding (RfU) assessment framework. This paper builds upon the two prior installments (Sabatini & O'Reilly, 2013; Sabatini, O'Reilly, & Deane, 2013) by discussing the role of performance moderators in the test design and how scenario-based assessment can be used as a tool…
Descriptors: Reading Comprehension, Reading Tests, Test Construction, Student Characteristics
Wolf, Raffaela; Zahner, Doris; Kostoris, Fiorella; Benjamin, Roger – Council for Aid to Education, 2014
The measurement of higher-order competencies within a tertiary education system across countries presents methodological challenges due to differences in educational systems, socio-economic factors, and perceptions as to which constructs should be assessed (Blömeke, Zlatkin-Troitschanskaia, Kuhn, & Fege, 2013). According to Hart Research…
Descriptors: Case Studies, International Assessment, Performance Based Assessment, Critical Thinking
Ozuru, Yasuhiro; Best, Rachel; Bell, Courtney; Witherspoon, Amy; McNamara, Danielle S. – Cognition and Instruction, 2007
This study examines how passage availability and reading comprehension question format (open-ended vs. multiple-choice) influence question answering. In two experiments, college undergraduates read an expository passage and answered open-ended and multiple-choice versions of text-based, local, and global bridging inference questions. Half the…
Descriptors: Reading Comprehension, Expository Writing, Test Format, Questioning Techniques

Kempa, R. F.; L'Odiaga, J. – Educational Research, 1984
Examines the extent to which grades derived from a conventional norm-referenced examination can be interpreted in terms of criterion-referenced performance assessments of different abilities and skills. Results suggest that performance is more affected by test format and subject matter than by the intellectual abilities tested by them. (JOW)
Descriptors: Criterion Referenced Tests, Norm Referenced Tests, Test Construction, Test Format

McCusker, Paul J. – Psychological Assessment, 1994
Three short forms of the Wechsler Adult Intelligence Scale-Revised (WAIS-R), developed in 1991, were cross-validated on 207 male and 133 female adolescent psychiatric inpatients and outpatients. Results show psychometric properties for the short forms that are comparable to those of the WAIS-R standardization sample. (SLD)
Descriptors: Adolescents, Clinical Diagnosis, Comparative Analysis, Intelligence Tests
Kingston, Neal M.; McKinley, Robert L. – 1988
Confirmatory multidimensional item response theory (CMIRT) was used to assess the structure of the Graduate Record Examination General Test, about which much information about factorial structure exists, using a sample of 1,001 psychology majors taking the test in 1984 or 1985. Results supported previous findings that, for this population, there…
Descriptors: College Students, Factor Analysis, Higher Education, Item Analysis

Morgan, Anne; Wainer, Howard – Journal of Educational Statistics, 1980
Two estimation procedures for the Rasch Model of test analysis are reviewed in detail, particularly with respect to new developments that make the more statistically rigorous conditional maximum likelihood estimation practical for use with longish tests. (Author/JKS)
Descriptors: Error of Measurement, Latent Trait Theory, Maximum Likelihood Statistics, Psychometrics

Demsky, Yvonne I.; Gass, Carlton S.; Golden, Charles J. – Assessment, 1998
Standardization data based on responses of 616 Puerto Ricans to the Spanish version of the Wechsler Adult Intelligence Scale (D. Wechlser, 1981) reveal reliability data and base rates to assist in evaluating the clinical significance of differences between Performance Intelligence Quotient (PIQ) and Verbal Intelligence Quotient (VIQ).…
Descriptors: Adults, Clinical Diagnosis, Intelligence Tests, Performance Factors
McCall, Chester H., Jr.; Gardner, Suzanne – 1984
The Research Services of the National Education Association (NEA) conducted a nationwide teacher opinion poll (TOP) based upon a stratified disproportionate two-state cluster sample of classroom teachers. This research study was conducted to test the hypothesis that the order of presentation of items would make no difference in the conclusions…
Descriptors: Attitude Measures, Elementary Secondary Education, National Surveys, Statistical Analysis