Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 20 |
Descriptor
Test Format | 124 |
Test Interpretation | 124 |
Test Reliability | 61 |
Test Validity | 53 |
Test Construction | 43 |
Testing | 38 |
Standardized Tests | 35 |
Test Content | 29 |
Test Reviews | 27 |
Elementary Secondary Education | 25 |
Test Items | 25 |
More ▼ |
Source
Author
White, Edward M. | 6 |
Benson, Jeri | 2 |
Lee, Won-Chan | 2 |
Smith, Douglas K. | 2 |
Amit Sevak | 1 |
Anderson, Colette | 1 |
Anderson, Scarvia B. | 1 |
Ansorge, Charles J. | 1 |
Areekkuzhiyil, Santhosh | 1 |
Ashton, Tamarah M. | 1 |
Bachor, Dan G. | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 5 |
Postsecondary Education | 4 |
Elementary Secondary Education | 3 |
Elementary Education | 1 |
Grade 10 | 1 |
Grade 4 | 1 |
Grade 8 | 1 |
Location
California | 6 |
United States | 4 |
Canada | 3 |
Delaware | 3 |
Hong Kong | 1 |
Italy | 1 |
Japan | 1 |
New York (New York) | 1 |
Ohio | 1 |
Rhode Island | 1 |
United Kingdom | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022
While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…
Descriptors: Scoring, Testing, Test Items, Test Format
Chen, Dandan – Online Submission, 2023
Technology-driven shifts have created opportunities to improve efficiency and quality of assessments. Meanwhile, they may have exacerbated underlying socioeconomic issues in relation to educational equity. The increased implementation of technology-based assessments during the COVID-19 pandemic compounds the concern about the digital divide, as…
Descriptors: Technology Uses in Education, Computer Assisted Testing, Alternative Assessment, Test Format
Areekkuzhiyil, Santhosh – Online Submission, 2021
Assessment is an integral part of any teaching learning process. Assessment has large number of functions to perform, whether it is formative or summative. This paper analyse the issues involved and the areas of concern in the classroom assessment practice and discusses the recent reforms take place. [This paper was published in Edutracks v20 n8…
Descriptors: Student Evaluation, Formative Evaluation, Summative Evaluation, Test Validity
Patrick Kyllonen; Amit Sevak; Teresa Ober; Ikkyu Choi; Jesse Sparks; Daniel Fishtein – ETS Research Report Series, 2024
Assessment refers to a broad array of approaches for measuring or evaluating a person's (or group of persons') skills, behaviors, dispositions, or other attributes. Assessments range from standardized tests used in admissions, employee selection, licensure examinations, and domestic and international large-scale assessments of cognitive and…
Descriptors: Assessment Literacy, Testing, Test Bias, Test Construction
Joseph, Dane Christian – Journal of Effective Teaching in Higher Education, 2019
Multiple-choice testing is a staple within the U.S. higher education system. From classroom assessments to standardized entrance exams such as the GRE, GMAT, or LSAT, test developers utilize a variety of validated and heuristic driven item-writing guidelines. One such guideline that has been given recent attention is to randomize the position of…
Descriptors: Test Construction, Multiple Choice Tests, Guessing (Tests), Test Wiseness
Ben Seipel; Sarah E. Carlson; Virginia Clinton-Lisell; Mark L. Davison; Patrick C. Kennedy – Grantee Submission, 2022
Originally designed for students in Grades 3 through 5, MOCCA (formerly the Multiple-choice Online Causal Comprehension Assessment), identifies students who struggle with comprehension, and helps uncover why they struggle. There are many reasons why students might not comprehend what they read. They may struggle with decoding, or reading words…
Descriptors: Multiple Choice Tests, Computer Assisted Testing, Diagnostic Tests, Reading Tests
Sinharay, Sandip – Grantee Submission, 2018
Tatsuoka (1984) suggested several extended caution indices and their standardized versions that have been used as person-fit statistics by researchers such as Drasgow, Levine, and McLaughlin (1987), Glas and Meijer (2003), and Molenaar and Hoijtink (1990). However, these indices are only defined for tests with dichotomous items. This paper extends…
Descriptors: Test Format, Goodness of Fit, Item Response Theory, Error Patterns
Dickens, Rachel H.; Meisinger, Elizabeth B.; Tarar, Jessica M. – Canadian Journal of School Psychology, 2015
The Comprehensive Test of Phonological Processing-Second Edition (CTOPP-2; Wagner, Torgesen, Rashotte, & Pearson, 2013) is a norm-referenced test that measures phonological processing skills related to reading for individuals aged 4 to 24. According to its authors, the CTOPP-2 may be used to identify individuals who are markedly below their…
Descriptors: Norm Referenced Tests, Phonology, Test Format, Testing
Moses, Tim – ETS Research Report Series, 2013
The purpose of this report is to review ETS psychometric contributions that focus on test scores. Two major sections review contributions based on assessing test scores' measurement characteristics and other contributions about using test scores as predictors in correlational and regression relationships. An additional section reviews additional…
Descriptors: Psychometrics, Scores, Correlation, Regression (Statistics)
Jordan, Sally – Computers & Education, 2012
Students were observed directly, in a usability laboratory, and indirectly, by means of an extensive evaluation of responses, as they attempted interactive computer-marked assessment questions that required free-text responses of up to 20 words and as they amended their responses after receiving feedback. This provided more general insight into…
Descriptors: Learner Engagement, Feedback (Response), Evaluation, Test Interpretation
Lee, Eunjung; Lee, Won-Chan; Brennan, Robert L. – College Board, 2012
In almost all high-stakes testing programs, test equating is necessary to ensure that test scores across multiple test administrations are equivalent and can be used interchangeably. Test equating becomes even more challenging in mixed-format tests, such as Advanced Placement Program® (AP®) Exams, that contain both multiple-choice and constructed…
Descriptors: Test Construction, Test Interpretation, Test Norms, Test Reliability
O'Reilly, Tenaha; Sabatini, John – ETS Research Report Series, 2013
This paper represents the third installment of the Reading for Understanding (RfU) assessment framework. This paper builds upon the two prior installments (Sabatini & O'Reilly, 2013; Sabatini, O'Reilly, & Deane, 2013) by discussing the role of performance moderators in the test design and how scenario-based assessment can be used as a tool…
Descriptors: Reading Comprehension, Reading Tests, Test Construction, Student Characteristics
Wolf, Raffaela; Zahner, Doris; Kostoris, Fiorella; Benjamin, Roger – Council for Aid to Education, 2014
The measurement of higher-order competencies within a tertiary education system across countries presents methodological challenges due to differences in educational systems, socio-economic factors, and perceptions as to which constructs should be assessed (Blömeke, Zlatkin-Troitschanskaia, Kuhn, & Fege, 2013). According to Hart Research…
Descriptors: Case Studies, International Assessment, Performance Based Assessment, Critical Thinking
Kolen, Michael J.; Lee, Won-Chan – Educational Measurement: Issues and Practice, 2011
This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…
Descriptors: Test Use, Test Format, Error of Measurement, Raw Scores

Silverstein, A. B. – Journal of Clinical Psychology, 1985
The findings of research on short forms of the Wechsler Adult Intelligence Scales-Revised are used to illustrate points about three criteria for evaluating the usefulness of a short form. Results indicate there is little justification for regarding the three criteria as criteria. (Author/BL)
Descriptors: Correlation, Evaluation Criteria, Test Format, Test Interpretation