Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 9 |
Descriptor
Comparative Testing | 24 |
Test Reliability | 24 |
Test Validity | 10 |
Foreign Countries | 6 |
Test Construction | 6 |
Academic Standards | 4 |
Scores | 4 |
Standardized Tests | 4 |
Student Evaluation | 4 |
College Students | 3 |
Correlation | 3 |
More ▼ |
Source
Author
Trevisan, Michael S. | 2 |
Alwis, W. A. M. | 1 |
Avery, Richard O. | 1 |
Awomolo, Ademola | 1 |
Babcock, Judith L. | 1 |
Begeny, John C. | 1 |
Bhola, Dennison S. | 1 |
Boyle, Gregory J. | 1 |
Brice, Julie | 1 |
Canney, George F. | 1 |
Codding, Robin S. | 1 |
More ▼ |
Publication Type
Reports - Evaluative | 24 |
Journal Articles | 15 |
Speeches/Meeting Papers | 4 |
Information Analyses | 3 |
Education Level
Higher Education | 5 |
Elementary Secondary Education | 2 |
Postsecondary Education | 2 |
Early Childhood Education | 1 |
Elementary Education | 1 |
Grade 2 | 1 |
Audience
Researchers | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Ole J. Kemi – Advances in Physiology Education, 2025
Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…
Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards
Turgut, Guliz – Clearing House: A Journal of Educational Strategies, Issues and Ideas, 2013
The ranking of the United States in major international tests such as the Progress in International Reading Literacy Study (PIRLS), Trends in International Mathematics and Science Study (TIMSS), and Program for International Student Assessment (PISA) is used as the driving force and rationale for the current educational reforms in the United…
Descriptors: Educational Change, Success, Educational Strategies, Educational Indicators
Lew, Magdeleine D. N.; Alwis, W. A. M.; Schmidt, Henk G. – Assessment & Evaluation in Higher Education, 2010
The purpose of the two studies presented here was to evaluate the accuracy of students' self-assessment ability, to examine whether this ability improves over time and to investigate whether self-assessment is more accurate if students believe that it contributes to improving learning. To that end, the accuracy of the self-assessments of 3588…
Descriptors: Self Evaluation (Individuals), Beliefs, Learning Processes, Correlation
Ricketts, Chris; Brice, Julie; Coombes, Lee – Advances in Health Sciences Education, 2010
The purpose of multiple choice tests of medical knowledge is to estimate as accurately as possible a candidate's level of knowledge. However, concern is sometimes expressed that multiple choice tests may also discriminate in undesirable and irrelevant ways, such as between minority ethnic groups or by sex of candidates. There is little literature…
Descriptors: Medical Students, Testing Accommodations, Ethnic Groups, Learning Disabilities
Korat, Ofra – Early Child Development and Care, 2009
The relationship between mothers' and educators' evaluation of 75 children's emergent literacy levels and actual levels were investigated. Two groups of mothers participated: mothers with a low education and mothers with a high education. The children's emergent literacy was measured. The mothers evaluated their own children and 40 teachers…
Descriptors: Mothers, Emergent Literacy, Interrater Reliability, Mother Attitudes
Setzer, J. Carl; He, Yi – GED Testing Service, 2009
Reliability Analysis for the Internationally Administered 2002 Series GED (General Educational Development) Tests Reliability refers to the consistency, or stability, of test scores when the authors administer the measurement procedure repeatedly to groups of examinees (American Educational Research Association [AERA], American Psychological…
Descriptors: Educational Research, Error of Measurement, Scores, Test Reliability

Panton, James H. – Journal of Clinical Psychology, 1980
Inmates score significantly lower on the second edition (BETA II) than on the first edition (BETA I), regardless of the order of administration. BETA I score distributions were unaffected by the order of administration. BETA II score distributions depended on whether BETA II was administered first or second. (Author)
Descriptors: Comparative Testing, Institutionalized Persons, Intelligence Tests, Prisoners
Kong, Xiaojing J.; Wise, Steven L.; Bhola, Dennison S. – Educational and Psychological Measurement, 2007
This study compared four methods for setting item response time thresholds to differentiate rapid-guessing behavior from solution behavior. Thresholds were either (a) common for all test items, (b) based on item surface features such as the amount of reading required, (c) based on visually inspecting response time frequency distributions, or (d)…
Descriptors: Test Items, Reaction Time, Timed Tests, Item Response Theory

Evans, L. – British Journal of Educational Psychology, 1980
When administered to 125 deaf youngsters, ages 5-12, the WISC performance test had good reliability and predictive validity, but administration to some young or physically handicapped children proved difficult. The Colored Progressive Matrices proved satisfactory with older subjects, but its suitability for younger deaf children was not confirmed.…
Descriptors: Age Differences, Comparative Testing, Deafness, Elementary Secondary Education
Boyle, Gregory J. – Psychological Test Bulletin, 1990
Research relating to the factor structure of the Sixteen Personality Factor Questionnaire (16PF) and the Clinical Analysis Questionnaire is reviewed. Different opinions about the factors measured by the 16PF are discussed. Focusing on the second-order factor level could eliminate problems with the instruments' reliability. (SLD)
Descriptors: Comparative Testing, Factor Structure, Literature Reviews, Personality Measures
Hogan, Thomas P.; Mishler, Carol – 1982
This literature review summarizes what is currently known about the agreement among six measures of writing skills. Three of these methods involve the application of human judgment in scoring or rating a piece of writing: holistic, analytical, and primary trait scoring. Two methods involve anatomical or taxonomic analysis of a piece of writing:…
Descriptors: Comparative Testing, Criterion Referenced Tests, Measurement Techniques, Scoring
Squires, David; Trevisan, Michael S.; Canney, George F. – Studies in Educational Evaluation, 2006
The Idaho Comprehensive Literacy Assessment (ICLA) is a faculty-developed, state-wide, high-stakes assessment of pre-service teachers' knowledge and application of research based literacy practices. The literacy faculty control all aspects of the test, including construction, refinement, administration, scoring and reporting. The test development…
Descriptors: Test Construction, Comparative Testing, Investigations, Test Reliability
Eckert, Tanya L.; Dunn, Erin K.; Codding, Robin S.; Begeny, John C.; Kleinmann, Ava E. – Psychology in the Schools, 2006
Teacher judgments have been identified as a primary source of information regarding student academic achievement. Research examining the accuracy of teachers' judgments in assessing students' academic abilities has shown relatively high accuracy. However, previous studies have relied primarily on norm-referenced measures to obtain estimates of…
Descriptors: Mathematics Skills, Academic Ability, Performance Based Assessment, Curriculum Based Assessment
DeMars, Christine E. – Online Submission, 2005
Several methods for estimating item response theory scores for multiple subtests were compared. These methods included two multidimensional item response theory models: a bi-factor model where each subtest was a composite score based on the primary trait measured by the set of tests and a secondary trait measured by the individual subtest, and a…
Descriptors: Item Response Theory, Multidimensional Scaling, Correlation, Scoring Rubrics

Stone, Clement A.; Lane, Suzanne – Applied Measurement in Education, 1991
A model-testing approach for evaluating the stability of item response theory item parameter estimates (IPEs) in a pretest-posttest design is illustrated. Nineteen items from the Head Start Measures Battery were used. A moderately high degree of stability in the IPEs for 5,510 children assessed on 2 occasions was found. (TJH)
Descriptors: Comparative Testing, Compensatory Education, Computer Assisted Testing, Early Childhood Education
Previous Page | Next Page ยป
Pages: 1 | 2