Reliability and validity / Concepts / Working with data / Evidence for learning / Home - Assessment
Describes the essential components of reliability and validity of the Response process: the relationship between the intended construct and. Chapter 3: Understanding Test Quality-Concepts of Reliability and Validity . Validity evidence indicates that there is linkage between test performance and job. ever, problems of reliability and/or validity plague any method of of communication in the classroom (Wil- kinson, ). Evaluators walk a fine line between evaluation for teacher search may not confirm the relationship be- tween basic.
Use only assessment procedures and instruments that have been demonstrated to be valid for the specific purpose for which they are being used. It is important to understand the differences between reliability and validity.
Testing and Assessment - Reliability and Validity
Validity will tell you how good a test is for a particular situation; reliability will tell you how trustworthy a score on that test will be. You cannot draw valid conclusions from a test score unless you are sure that the test is reliable. Even when a test is reliable, it may not be valid. You should be careful that any test you select is both reliable and valid for your situation. A test's validity is established in reference to a specific purpose; the test may not be valid for different purposes.
For example, the test you use to make valid predictions about someone's technical proficiency on the job may not be valid for predicting his or her leadership skills or absenteeism rate. This leads to the next principle of assessment. Similarly, a test's validity is established in reference to specific groups. These groups are called the reference groups. The test may not be valid for different groups. For example, a test designed to predict the performance of managers in situations requiring problem solving may not allow you to make valid or meaningful predictions about the performance of clerical employees.
If, for example, the kind of problem-solving ability required for the two positions is different, or the reading level of the test is not suitable for clerical applicants, the test results may be valid for managers, but not for clerical employees.
Test developers have the responsibility of describing the reference groups used to develop the test. The manual should describe the groups for whom the test is valid, and the interpretation of scores for individuals belonging to each of these groups.
You must determine if the test can be used appropriately with the particular type of people you want to test. This group of people is called your target population or target group. Use assessment tools that are appropriate for the target population. Your target group and the reference group do not have to match on all factors; they must be sufficiently similar so that the test will yield meaningful scores for your group.
For example, a writing ability test developed for use with college seniors may be appropriate for measuring the writing ability of white-collar professionals or managers, even though these groups do not have identical characteristics.
In determining the appropriateness of a test for your target groups, consider factors such as occupation, reading level, cultural differences, and language barriers. Recall that the Uniform Guidelines require assessment tools to have adequate supporting evidence for the conclusions you reach with them in the event adverse impact occurs.
A valid personnel tool is one that measures an important characteristic of the job you are interested in. Use of valid tools will, on average, enable you to make better employment-related decisions.
Both from business-efficiency and legal viewpoints, it is essential to only use tests that are valid for your intended use. In order to be certain an employment test is useful and valid, evidence must be collected relating the test to a job.
The process of establishing the job relatedness of a test is called validation. Methods for conducting validation studies The Uniform Guidelines discuss the following three methods of conducting validation studies. The Guidelines describe conditions under which each type of validation strategy is appropriate.
They do not express a preference for any one strategy to demonstrate the job-relatedness of a test. Criterion-related validation requires demonstration of a correlation or other statistical relationship between test performance and job performance. In other words, individuals who score high on the test tend to perform better on the job than those who score low on the test.
If the criterion is obtained at the same time the test is given, it is called concurrent validity; if the criterion is obtained at a later time, it is called predictive validity. Content-related validation requires a demonstration that the content of the test represents important job-related behaviors. In other words, test items should be relevant to and measure directly important requirements and qualifications for the job.
Construct-related validation requires a demonstration that the test measures the construct or characteristic it claims to measure, and that this characteristic is important to successful performance on the job.
The three methods of validity-criterion-related, content, and construct-should be used to provide validation support depending on the situation.
Reliability and validity
These three general methods often overlap, and, depending on the situation, one or more may be appropriate. French offers situational examples of when each method of validity may be applied.
First, as an example of criterion-related validity, take the position of millwright. Employees' scores predictors on a test designed to measure mechanical skill could be correlated with their performance in servicing machines criterion in the mill.
If the correlation is high, it can be said that the test has a high degree of validation support, and its use as a selection tool would be appropriate. Second, the content validation method may be used when you want to determine if there is a relationship between behaviors measured by a test and behaviors involved in the job. For example, a typing test would be high validation support for a secretarial position, assuming much typing is required each day. If, however, the job required only minimal typing, then the same test would have little content validity.
Content validity does not apply to tests measuring learning ability or general problem-solving skills French, Finally, the third method is construct validity. This method often pertains to tests that may measure abstract traits of an applicant.
For example, construct validity may be used when a bank desires to test its applicants for "numerical aptitude. To demonstrate that the test possesses construct validation support, ". Professionally developed tests should come with reports on validity evidence, including detailed explanations of how validation studies were conducted.
If you develop your own tests or procedures, you will need to conduct your own validation studies. As the test user, you have the ultimate responsibility for making sure that validity evidence exists for the conclusions you reach using the tests. This applies to all tests and procedures you use, whether they have been bought off-the-shelf, developed externally, or developed in-house. Validity evidence is especially critical for tests that have adverse impact.
When a test has adverse impact, the Uniform Guidelines require that validity evidence for that specific employment decision be provided. The particular job for which a test is selected should be very similar to the job for which the test was originally developed. Determining the degree of similarity will require a job analysis.
Job analysis is a systematic process used to identify the tasks, duties, responsibilities and working conditions associated with a job and the knowledge, skills, abilities, and other characteristics required to perform that job.
Job analysis information may be gathered by direct observation of people currently in the job, interviews with experienced supervisors and job incumbents, questionnaires, personnel and equipment records, and work manuals.
In order to meet the requirements of the Uniform Guidelines, it is advisable that the job analysis be conducted by a qualified professional, for example, an industrial and organizational psychologist or other professional well trained in job analysis techniques.
Job analysis information is central in deciding what to test for and which tests to use. Using validity evidence from outside studies Conducting your own validation study is expensive, and, in many cases, you may not have enough employees in a relevant job category to make it feasible to conduct a study. Therefore, you may find it advantageous to use professionally developed assessment tools and procedures for which documentation on validity already exists.
However, care must be taken to make sure that validity evidence obtained for an "outside" test study can be suitably "transported" to your particular situation. Consider the following when using outside tests: The validation procedures used in the studies must be consistent with accepted standards.
A job analysis should be performed to verify that your job and the original job are substantially similar in terms of ability requirements and work behavior. Reports of test fairness from outside studies must be considered for each protected group that is part of your labor market. Where this information is not available for an otherwise qualified test, an internal study of test fairness should be conducted, if feasible.
These include the type of performance measures and standards used, the essential work activities performed, the similarity of your target group to the reference samples, as well as all other situational factors that might affect the applicability of the outside test for your use.
To ensure that the outside test you purchase or obtain meets professional and legal standards, you should consult with testing professionals. A coefficient of 0. Assessment tool manuals contain comprehensive administration guidelines. It is essential to read the manual thoroughly before conducting the assessment. Validity Educational assessment should always have a clear purpose.
Nothing will be gained from assessment unless the assessment has some validity for the purpose. For that reason, validity is the most important single attribute of a good test. The validity of an assessment tool is the extent to which it measures what it was designed to measure, without contamination from other characteristics.
For example, a test of reading comprehension should not require mathematical ability. There are several different types of validity: It is fairly obvious that a valid assessment should have a good coverage of the criteria concepts, skills and knowledge relevant to the purpose of the examination. The important notion here is the purpose. The PROBE test is a form of reading running record which measures reading behaviours and includes some comprehension questions. It allows teachers to see the reading strategies that students are using, and potential problems with decoding.
There is an important relationship between reliability and validity. An assessment that has very low reliability will also have low validity ; clearly a measurement with very poor accuracy or consistency is unlikely to be fit for its purpose. But, by the same token, the things required to achieve a very high degree of reliability can impact negatively on validity.
For example, consistency in assessment conditions leads to greater reliability because it reduces 'noise' variability in the results.