Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 18 |
Since 2006 (last 20 years) | 39 |
Descriptor
Error of Measurement | 76 |
Scores | 76 |
Test Reliability | 76 |
Test Validity | 22 |
Test Interpretation | 14 |
Test Items | 13 |
Correlation | 11 |
Item Response Theory | 11 |
Foreign Countries | 10 |
Measurement Techniques | 10 |
Statistical Analysis | 10 |
More ▼ |
Source
Author
Blaker, Lisa | 2 |
Dedrick, Robert F. | 2 |
Ho, Andrew D. | 2 |
Lê, Thanh | 2 |
Najarian, Michelle | 2 |
Nicewander, W. Alan | 2 |
Nord, Christine | 2 |
Reardon, Sean F. | 2 |
Setzer, J. Carl | 2 |
Shaunessy-Dedrick, Elizabeth | 2 |
Suldo, Shannon M. | 2 |
More ▼ |
Publication Type
Education Level
Higher Education | 9 |
Secondary Education | 9 |
Postsecondary Education | 7 |
High Schools | 6 |
Elementary Secondary Education | 4 |
Junior High Schools | 4 |
Middle Schools | 4 |
Grade 10 | 3 |
Grade 9 | 3 |
Elementary Education | 2 |
Grade 11 | 2 |
More ▼ |
Audience
Researchers | 3 |
Location
Netherlands | 3 |
Canada | 2 |
Indonesia | 2 |
Spain | 2 |
California | 1 |
Denmark | 1 |
Georgia | 1 |
Germany | 1 |
North Carolina | 1 |
Oklahoma | 1 |
South Africa | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation
John Jerrim; Luis Alejandro Lopez-Agudo; Oscar David Marcenaro-Gutierrez – British Journal of Educational Studies, 2024
International large-scale assessments have gained much attention since the beginning of the twenty-first century, influencing education legislation in many countries. This includes Spain, where they have been used by successive governments to justify education policy change. Unfortunately, there was a problem with the PISA 2018 reading scores for…
Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students
Anders Holm; Anders Hjorth-Trolle; Robert Andersen – Sociological Methods & Research, 2025
Lagged dependent variables (LDVs) are often used as predictors in ordinary least squares (OLS) models in the social sciences. Although several estimators are commonly employed, little is known about their relative merits in the presence of classical measurement error and different longitudinal processes. We assess the performance of four commonly…
Descriptors: Elementary Education, Scores, Error of Measurement, Predictor Variables
Lehmann, Vicky; Hillen, Marij A.; Verdam, Mathilde G. E.; Pieterse, Arwen H.; Labrie, Nanon H. M.; Fruijtier, Agnetha D.; Oreel, Tom H.; Smets, Ellen M. A.; Visser, Leonie N. C. – International Journal of Social Research Methodology, 2023
The Video Engagement Scale (VES) is a quality indicator to assess engagement in experimental video-vignette studies, but its measurement properties warrant improvement. Data from previous studies were combined (N = 2676) and split into three subsamples for a stepped analytical approach. We tested construct validity, criterion validity,…
Descriptors: Likert Scales, Video Technology, Vignettes, Construct Validity
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Nicewander, W. Alan – Educational and Psychological Measurement, 2019
This inquiry is focused on three indicators of the precision of measurement--conditional on fixed values of ?, the latent variable of item response theory (IRT). The indicators that are compared are (1) The traditional, conditional standard errors, s(eX|?) = CSEM; (2) the IRT-based conditional standard errors, s[subscript irt](eX|?)=C[subscript…
Descriptors: Measurement, Accuracy, Scores, Error of Measurement
Schmitz, Eva A.; Salemink, Elske; Wiers, Reinout W.; Jansen, Brenda R. J. – Journal of Psychoeducational Assessment, 2022
The Abbreviated Math Anxiety Scale (AMAS) is commonly used to compare groups on math anxiety. Group comparisons should however be preceded by a demonstration of metric and scalar measurement invariance, which is currently only available for undergraduate students in the USA. This study tested for metric and scalar measurement invariance of AMAS…
Descriptors: Foreign Countries, Secondary School Students, College Students, Mathematics Anxiety
Pei-Hsuan Chiu – ProQuest LLC, 2018
Evidence of student growth is a primary outcome of interest for educational accountability systems. When three or more years of student test data are available, questions around how students grow and what their predicted growth is can be answered. Given that test scores contain measurement error, this error should be considered in growth and…
Descriptors: Bayesian Statistics, Scores, Error of Measurement, Growth Models
Schnoor, Birger; Hartig, Johannes; Klinger, Thorsten; Naumann, Alexander; Usanova, Irina – Language Testing, 2023
Research on assessing English as a foreign language (EFL) development has been growing recently. However, empirical evidence from longitudinal analyses based on substantial samples is still needed. In such settings, tests for measuring language development must meet high standards of test quality such as validity, reliability, and objectivity, as…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Longitudinal Studies
van der Lans, Rikkert M.; Maulana, Ridwan; Helms-Lorenz, Michelle; Fernández-García, Carmen-María; Chun, Seyeoung; de Jager, Thelma; Irnidayanti, Yulia; Inda-Caro, Mercedes; Lee, Okhwa; Coetzee, Thys; Fadhilah, Nurul; Jeon, Meae; Moorer, Peter – SAGE Open, 2021
This study examines measurement invariance of student perceptions of teaching quality collected in five countries: Indonesia (n students = 6,331), the Netherlands (n students = 6,738), South Africa (n students = 3,422), South Korea (n students = 6,997) and Spain (n students = 4,676). The administered questionnaire was the My Teacher Questionnaire…
Descriptors: Foreign Countries, Student Attitudes, Student Evaluation of Teacher Performance, Teacher Effectiveness
Moore, Joann L.; Li, Tianli; Lu, Yang – ACT, Inc., 2020
The Every Student Succeeds Act requires that English Learners (ELs) are included in annual state testing (grades 3-8 and once in high school) and included in each state's accountability system disaggregated by subgroup to ensure that they receive the support they need to learn English, participate fully in their education experience, and graduate…
Descriptors: College Entrance Examinations, Scores, English Language Learners, Accountability
Irby, Sarah M.; Floyd, Randy G. – Psychology in the Schools, 2017
This study examined the exchangeability of total scores (i.e., intelligent quotients [IQs]) from three brief intelligence tests. Tests were administered to 36 children with intellectual giftedness, scored live by one set of primary examiners and later scored by a secondary examiner. For each student, six IQs were calculated, and all 216 values…
Descriptors: Intelligence Tests, Gifted, Error of Measurement, Scores
Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017
Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…
Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests
Westrick, Paul A. – Educational Assessment, 2017
Undergraduate grade point average (GPA) is a commonly employed measure in educational research, serving as a criterion or as a predictor depending on the research question. Over the decades, researchers have used a variety of reliability coefficients to estimate the reliability of undergraduate GPA, which suggests that there has been no consensus…
Descriptors: Undergraduate Students, Test Reliability, College Entrance Examinations, Longitudinal Studies
Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Descriptors: Test Bias, Test Reliability, Performance, Scores