Sampling and Measurement Validation of Tests
Research Designs and Foundation
One of the most crucial and important components of the instruments used to calculate research data is validity (Ary et al., 2017). Validity is the quantity or magnitude that a tool/apparatus conclusively gauges what it is supposed to gauge (Ary et al., 2017). The Georgia End-of-Course Tests (EOCT) were previously utilized by the state of Georgia to ensure and foster effective learning practices, they are now withdrawn and have now been interchanged with The Georgia Milestones Assessment System (Georgia Department of Education, 2017). Understanding validity, reliability, development and methods of an instrument is essential to drawing conclusions and inferences related to the data the instrument provides (Ary et al., 2017).
The EOCT were initially designed by the Georgia Department of Education to conform and respect state law that mandated the Georgia Department of Education to provide final evaluations within fundamental classes for learners “in grades 9 through 12” (Barge, 2013). The EOCT were provided to students who reside within the state of Georgia who were registered within one of these fundamental classes (Georgia Department of Education, 2017). Following the completion of the course, the student would take the corresponding EOCT, and their achieved score would “account for either 15% or 20% of ….the…overall grade in EOCT-associated courses” (Barge, 2013). Depending upon when the student was registered for ninth grade the “… students enrolled in grade nine for the first time before July 1, 2011, the EOCT counted as 15% of the final grade.” (Georgia Department of Education, 2017). “For students enrolled in grade nine for the first time on July 1, 2011 or after, the EOCT counted as 20% of the final grade.” (Georgia Department of Education, 2017). The EOCT could “…be administered via paper-and pencil…or in an on-line format” (Georgia Department of Education, 2017). Students were provided with a total of 2 hours to complete the testing that could be spread across a single or a couple days (Georgia Department of Education, 2017). EOCT provides students, caregivers, teachers, administration, schools, etc. with information regarding knowledge of the students’s and teacher’s success of learning/teaching the curriculum in various departments (Barge, 2013). The validity and the reliability of the EOCT aided students, caregivers, teachers, administration and schools as a whole, etc, in knowing that EOCT provided effective measurements and correlated to student adeptness (Barge, 2013).
The Georgia Board of Education (2017) reports the EOCT has been withdrawn and has been superseded by a new instrument. The EOCT was previously responsible to annually judge learning achievements (Barge, 2013). The results of this test were utilized to ensure that the students achieved a level of success and expertise within each of the sections as well as overall knowledge acquirement of the course of study (Barge, 2013). The Georgia Department of Education (2017) reports that the EOCT was provided as a multiple choice examination for students (Georgia Department of Education, 2017). The results of these tests were crucial to the students to gain a diploma (Barge, 2013).
Per Ary et al. (2017) when inspecting a tool for validity it is important to consider three compartments that may superimpose over each other but are significant to the idea of validity. The three compartments are comprised of “evidence based on test content”, “evidenced based on relations to a criterion” and “construct-related evidenced of validity” (Ary et al., 2017). It is important to describe the development of the tests and then how these tests demonstrate levels of validity and reliability.
Barge (2013) reports that an assessment gains validity by its development, which creates a significance to how the EOCT was devised. The EOCT began with the “state’s mandated curriculum” with inspiration from instructors working within Georgia (Barge, 2013). With both of these pieces, the EOCT was able to develop intention/meaning to the EOCT (Barge, 2013). Various educational representatives were gathered to help analyze and examine Georgia’s course of study and define what aspects of the curriculum should be included and what approach would be most suitable to test it (Barge, 2013). These educational representatives created a framework of what is to be included on the test and how each fundamental course needed to be evaluated (Barge, 2013).
The development of the framework led into more specific particulars (Barge, 2013). The framework consisted of “a test blueprint and test specifications” both these items help guide further particulars. These particulars are then broken down into “content domain specifications” and “test item specifications” (Barge, 2013). “Content domain specifications” are defined as “how” the pieces of the “curriculum” will be arranged (Barge, 2013). While, “item specifications” provided even more specifics regarding what aspects will be included within the assessment (Barge, 2013). For example, the “content domain specification” may be the inclusion of the “area” of “biology” while the “item specification” relating to this “domain” may include what aspect of “biology” to cover, (e.g, information upon enzymes or cell structures , etc.) and the solving the overall challenge of how the question is provided and covered (Barge, 2013).
When the “test blueprints” and “test specifications” are complete, the two are meshed to form the “EOCT content descriptions” (Barge, 2013). The “EOCT content descriptions” are provided publicly and allow anyone to access the EOCT’s “content… method of assessment…structure in which the results will be reported…also shows the relative proportion of items by domain that are included on each content area test” (Barge, 2013). “These documents and the inclusion of Georgia educators serve as one piece of evidence of the EOCT’s validity as a measure of the state’s curriculum” (Barge, 2013). When everything was accepted, the EOCT commenced building the testing stimuli, which were created by competent practitioners, then further evaluated by representatives to assess for appropriate ties to the “curriculum”, impartialness and any other potential problems to which the representatives “committees” were able to pass the stimuli, modify it or deny it (Barge, 2013).
The “field tests” are composed of the passed testing stimuli and are there to further provide validity by allowing the testing stimuli “…to function appropriately and …not present as confusing for students” (Barge, 2013) Following the “field tests”, the stimuli are again evaluated through another group of representatives, this evaluation consists of how the students who received the “field tests” success in answering the stimuli correctly (Barge, 2013). This group further reviews any flaws in designs and gauges impartialness by categorizing students and judging how fair the question was. Now, this group of representatives have the power to modify, keep or deny stimuli. Once this therapeutic stimuli given the go ahead the stimuli item may be issued on “an operational test form”. Following, it is time to create the evaluation that Georgia students will be faced with. With this creation it is imperative to review “both content and statistical data” and each test “must assess the same range pos content as well as carry the same statistical attributes”. Barge (2013) reports it is crucial that the EOCT are “equated” to ensure that student after student are receiving an evaluation of equal degree of complexity and that “students are always held to the same standard.” This “equating” allows for “one to interpret differences in test performance as the result of changes in student achievement as opposed to fluctuations in the properties of the test form”-which relates directly back to validity, specifically
Nearing the end of the development of the test, is determining the “performance level standards” or how many of the testing stimuli the student has to achieve accurately to meet “expectations”. The last stage is “to produce scores and distribute results”. “Scores are typically reported as scale scores and performance levels.” The EOCT reports scale scores because they “can be consistently and meaningfully interpreted by students, parents and educators.”
“federal law called for rigorous examinations based on rigorous academic content to be in place for reading and mathematics in grades 3-8 and in high school…” Barge (2013) notes that the EOCT supports validity by “how the test is used” and the EOCT is utilized as “the high school indicator for Georgia.”
Specific attention to the aspects of the “development” of the EOCT can provide evidence to the EOCT’s validity. Besides, the aforementioned test development process Barge (2013) reports that there are further investigations that distinguish between the EOCT and additional comparable tests which are a testament to the EOCT’s “external validity”. It was important to understand how the test was devised as one can tie back the information to the three compartments that provide evidence to validity (Ary et al., 2017). Barge (2013), noted that each piece of testing stimuli was created and subsequently critiqued by various teachers to ensure that it aligned with the course of study, further following, the
The current tool utilized by the Georgia Department of Education is the Georgia Milestones Assessment System. The Georgia Milestones Assessment System calculates a Georgia student’s ability to obtain and utilize learning that aligns with course of study expectations and provides information regarding where students may need supplemental instruction (Georgia Department of Education, 2017). The Georgia Milestones Assessment System examination is provided to students in Georgia in “grades three through eight and in selected high school courses” (Georgia Department of Education, 2017). The examination is cumulative and addresses “content standards” of “English language arts, mathematics, science and social studies. Students in grades 3 through 8 take an end of grade assessment in English Language Arts and mathematics while students in grade 5 and 8 are also assessed in science and social studies. High school students take an end of course assessment for each of the ten courses designated State Board of Education” (online). The Georgia Milestones Assessment System Examination is provided by the school system via an “online administration” with “paper-pencil as back up”. The Georgia Milestones Assessment System assesses “english language arts” via courses “ninth grade literature and composition” and “American literature and composition”, “mathematics” via algebra I or coordinate algebra” and “geometry or analytic geometry”, “science” via “biology” and “physical science” and “social studies” via “united states history” and “economics/business/free enterprise”(online). The Georgia Milestones Assessment System provide the student with “20% to the student’s final course grade.” The results of the Georgia Milestones Assessment System provide information to various people “that can use the results as a barometer of the quality of educational opportunity provided throughout the state of Georgia. As such, Georgia Milestones Assessment System serves as a key component of the state’s accountability system the College and Career Ready Performance Index…” (Online).
Georgia Milestones Assessment System like the EOCT creates it’s validity by it’s development, design and it’s specific report upon the test’s intent (Georgia Department of Education, 2017). The reason for the Georgia Milestones Assessment System is to provide an instrument that can adequately describe how well students are learning the components of the curriculum and where students and teachers maybe struggling (Georgia Department of Education, 2017). The Georgia Milestones Assessment System validity is based on “how well the assessment instrument matches the intended content standard and how the score reports inform the various stateholders-students, parents and educators-about the student’s performance (Georgia Department of Education, 2017). Like the EOCT, The Georgia Milestones Assessment System is based upon the “state’s mandated content standards”. The development of the Georgia Milestones Assessment and follows the same guidelines to ensure validity, including the use of educational representatives, use of “committees” developing the key documents of “test specifications” and “content domain specifications along with the frameworks that guide the test into groupings (Georgia Department of Education, 2017). Further, the development evolves into the creation of “test item specifications” which are led by the Department of Education with guidance from educational representatives within the state, the “content domain specifications” are made into “publicly posted documents… so that all stakeholders are informed of the test’s content and method of assessment”, “these documents and the inclusion of Georgia educators serve as one piece of evidence of the Georgia Milestones validity as a measure of the state’s content standards.” Testing stimuli is then created by the educational professionals, different “committees'” look over all of the testing stimuli created in attempts to make each stimuli fair, impartial and relevant to the “curriculum”. These “committees” have the power to include the stimuli, modify or dismiss it and once that is complete the included stimuli are placed into “field tests” that test the included items to make sure they are appropriate for students’s needs and these “field test” stimuli are provided into regular “operational testss” given to “a representative group of motivated students under standard conditions”. Once these developed testing stimuli have been assessed, teachers again review the stimuli, in combination with the information regarding whether students were achieving success with the stimuli or not and how this success related to certain groups of people, after investigating this information, investigators again get the opportunity to include, alter or dismiss the stimuli. The included testing stimuli is then utilized for upcoming tests. Once the therapeutic stimuli is agreed upon, the teachers then construct the document the students will utilize when taking these exams. The Georgia Department of Education takes into account that each document incorporates “the same range of content as well as carry the same statistical attributes”. Like the EOCT, the Georgia Milestone Assessment System are “equated” to ensure equality within level of complexity across examinations. Next, guidelines are created to which teachers decide what amount the test taker must achieve to succeed on the test. Once that is created, the outcomes are provided. The Georgia Milestone Assessment System “are reported as scale scores and performance levels” in efforts for the “results can be consistently and meaningfully interpreted by students, parents and educators”. The detail provided regarding the development process is important to discuss because that is how the Georgia Department of education can report and provide validity measures regarding the examinations.
As mentioned previously, there are three compartments that demonstrate evidence for amount of validity. Ary et al. (2017) reports “evidence based on test content” is that “the test to be used represents a balanced and adequate sampling of all relevant knowledge, skills and dimensions making up the content domain” (Ary et al., 2017). For both the EOCT and the Georgia Milestones Assessment system the instrument was developed using the knowledge and expertise of professionals/educational representatives as the they tied the course of study and the specified “content area” in a systematic way (Barge, 2013). The system of development for both the EOCT and the Georgia Milestones Assessment System increased validity “evidence based on content” by systematically utilizing experts to review the curriculum, utilized outside professionals within the state, tested the test stimuli to ensure that it encompasses the Georgia curriculum appropriately, prepared frameworks and follow through with subsequent “field tests” and follow up revisions, scrutiny and examination (Ary et al., 2017) (Barge, 2013) (Georgia Department of Education, 2017). Further, Barge (2013) continues to report strong “evidence based on content” for the EOCT by stating “the department has collected evidence through separate independent alignment studies to ensure the test measures the state’s curriculum.” (Georgia Department of Education, 2017).
Barge (2013), reports that one of the main foundations of whether a test is valid or not, is if the assessment provides an understandable intention of the test. Both the Georgia Milestones Assessment System and the EOCT provide comprehensive, understandable intentions and demonstrate well defined examinations within a student’s fundamental curriculum which helps staff/students and caregivers estimate whether this student has the necessary aptitudes to graduate or if they may need supplemental instruction (Georgia Department of Education, 2017). Both assessments boast high alignment to the Georgia curriculum (Georgia Department of Education).
Additionally, while it is not directly stated in either one of the briefs, both the Georgia Milestones Assessment System and the EOCT most likely demonstrated face validity due to the Georgia Department of Education posting and making the documents “Georgia Milestones Test Blueprints and Content Weight” and the “EOCT Content Descriptions” public enhanced the level of face validity.
For the Georgia Milestones Assessment System, “evidence based on content” can also be demonstrated by the amount of testing stimuli upon each test (i.e., range of number of operational items” “44-55” depending on “course/grade” (Georgia Department of Education, 2017). Ary et al. (2017) reports that one way of increasing validity “evidence based on content” is by demonstrating a wide amount of questions in each subject.
The next area that the concept of validity draws evidence from is “evidence based on relations to a criterion” which “refers to the extent to which test scores are systemically related to one or more outcome criteria” (Ary et al, 2017). The use of the “criterion” is important because ultimately the investigator “uses the test scores to infer performance on the criterion.” (Ary et al, 2017).
Barge (2013), defines criterion related validity and reports the importance of gaining this type of validity however, he does not provide information relating the EOCT to other assessments. Further, the Georgia Department of Education (2017) did not provide any specifications for concurrent or predictive validity.
The third type of evidence for validity that Ary et al. (2017) provided was “construct related validity”. Construct validity is defined as “the degree to which the test score is a measure of the psychological characteristic (i.e., construct) of interest (Barge, 2013). Barge (2013) reports that the EOCT demonstrates strong construct validity by providing the two different “internal structure” “metrics” (Ary et al., 2017). The first information provided was the “point-biserial correlation” which “is the correlation between an item and the total test score” (Barge, 2013). This information signifies that “students who performed well on the test overall answered the item correctly and students who performed poorly on the test overall answered the item incorrectly.” (Barge, 2013). Further, Barge (2013) states “high point-serial correlation indicate the items on the test require knowledge of the construct in order to be answered correctly.” The second “metric” Barge reported is the “Rasch fit statistics” which “…are observed closely during the test construction process to ensure evidence on construct validity”. Barge (2013) also states that the “test unidemensionality is the single dominating factor extracted from the exploratory factor analysis. There was a dominant factor observed in each of the scree plots for the tests, suggesting the IRT unidimensionality assumption held reasonable well.”
For the Georgia Milestones Assessment System evidence for construct related validity is demonstrated within the devising, modification and revision of the test. The Georgia Milestones Assessment went through rigorous scrutiny and various stages where it was assessed for internal structure by providing “field tests” and “equating” tests (Georgia Department of Education, 2017).
“Reliability is the consistency (and hence, precision) of the results obtained from a measurement. (Barge, 2013).” The EOCT provides two reliability “indices”-the “Cronbach’s alpha reliability coefficient” and use of “standard error of measurement” (Barge, 2013). Barge’s (2013) data provided reliability coefficients that “ranged” from “0.74” to “0.94” of the “administrations provided” PROVIDE MORE DETAIL REGARDING RELAIBILITY
Reliability for the Georgia Milestones Assessment was assessed by “Cronbach’s alpha reliability coefficient” (Georgia Department of Education, 2017). Ary et. al (2017) reports that the “Cronbach’s Alpha” provides the investigator with the correlation or agreement within the testing stimuli, further, the use of this “procedure” is “used when times are scored dichotomously…or when items are scored on a range of points as in an essay test” which both are utilized within the Georgia Milestones Assessment Level. Ary et al. (2017) report that the “reliability coefficient reports the “proportion of the variance in the observed scores that is free from error” and “can range from 0 to 1”, “when the error is all error the coefficient is 0” and “when there is no error in the measurement… the reliability is 1.” (Ary et al., 2017). The Georgia Department of Education (2017) reports that their “reliability coefficient” “ranges from 0.87 to 0.93.” Ary et. Al reports six components that may influence reliability upon a test. The Georgia Milestones System Assessment System reports that the tests range from “44 to 55” items which helps support reliability as these are relatively lengthy tests and the “longer the test the greater the reliability”.