Welcome to ESE 503

Psychology, Assessment & Validity

Now that you have read your first chapter in Psychology, you are probably feeling a little let down. Perhaps you already took a class in Psychology and realized it doesn't even come close to the cool stuff you see on television. Psychology is about science, the science of the mind, which in some ways can be an art. In the field of assessment it is particularly important that you remember that Psychology and Assessment is a science based on mathematics and gosh darn it as hard a science as is available for something as complex as the human mind. Make no mistake. The hardest part of understanding the mind is that our mind is not complex enough to understand how our mind functions. Sort of a Gordian knot but nevertheless a truth.

Here are some key points borrowed from Professor : These are the truth and you need to understand them to understand Assessment today!

Psychology is a science

Science does not require gadgetry--or even mathematics (assessment traditionally but not necessarily requires mathematics).

The essence of science: finding the truth by discovering observable, objective evidence that either supports or refutes our preconceived notions. (but be careful not to let your preconceived notions get in the way of truth, that's called bias)

Special emphasis in psychology must be placed on meeting the following criteria:

* producing objective evidence that could be replicated (indeed,

replicated with the same success as physics experiments are

replicated)

* testing ideas about human behavior to find out if they are wrong

* being willing to be open-minded about claims, even those that went against common sense

* being skeptical about ideas that, even though they make sense, have not been supported by any research evidence (ghosts, god)

* not being skeptical about ideas that can not be proven because we don't have tools sophisticated enough to measure them (ghosts, god) Paradox isn't it

* creating new knowledge

As you think about Psychology and Assessment please also remember:

People are usually not objective. Consequently,

a. being objective will take some discipline on your part; and

b. it will also take some effort on your part to recognize when someone else is not being objective. (This is really important because you may be going against the word of a well trained professional)

When you are looking at an assessment it is also really important to realize that the Psychometrist is only Making inferences about mental states. That is, he or she really can't directly observe constructs --even though it will seem as though he or she does unless the assessment is written very carefully!!

READ THAT LAST PARAGRAPH AGAIN!!!!! THE PSYCHOMETRIST IS ASSUMING THAT THE PERSON IS THINKING OR FEELING OR LEARNING BASED ON EVIDENCE THAT MAY OR MAY NOT BE CORRECT!!!! THEY ARE ONLY INFERENCES BASED ON WHAT THE TESTS POINT TO!!!!

One area that does not directly look toward assessment but is important to keep in the back of your mind as you work with psychometrists is that there are problems with using the scientific method to get answers to questions about human behavior. At one level, there are two basic problems about doing research to get answers to questions about human behavior:

1. The study that has been done may be unethical (look at the Milgrim studies: http://www.radford.edu/~jaspelme/gradsoc/obedience/Migram_Obedience.pdf )

2. The study you do may not answer the question.

At another level, there is only one problem: Is the study ethical?

According to APA's ethical principles (which every researcher should consult before doing a study), a study is ethical if the potential benefits of the study outweigh the study's potential for harm. Thus, there are two ways to increase the chances that your study is ethical. (You may recognize that the same principal is in place for pharmaceuticals as you look at the side effects. Ask yourself is the researcher biased in deciding whether or not the good outweighs the bad, for instances the lessening of arthritis with a possibility of getting cancer from taking the drug. Ask yourself which one outweighs the other? )

If your research question is about whether something causes a certain effect, your study must have internal validity. As you'll see later, only experimental designs have internal validity. (Ask yourself how this impacts assessment…how often could assessments have internal validity?)

Alternatively, if your research question is about what percentage of people do some behavior, you need a study that has external validity. One key to having external validity is to have a large, random, representative sample of subjects (Good now we are looking at assessments….but how often do you think assessment designers can really have this necessary pool of ‘victims' aka ‘guinea pigs'.

Or, if your research question involves measuring or manipulating some state of mind (hunger, stress, learning, fear, motivation, love, etc.), then you need construct validity. Achieving construct validity is not easy. (Great…measuring a state of mind…so we are going to need construct validity on a true assessment as well).

Depending on the research question, you may often be interested in only one of these kinds of validity. Sometimes, you may want to have two of these kinds of validity. Rarely, however, will a study have all three types of validity.

Good we are tied back into assessment. Which types of validity are you going to want to know about when someone is telling you about an assessment? (Just a small side note here….you will probably never ask a Psychometrist about the types of validity his or her tests have. This is more to help you understand what drove the development of the test. If you have concerns and worries talk to the individual and see if they can help alleviate your concerns. It is actually a required part of their job to explain the results to you because the assumption is that with the complexity of today's assessment you will have to be an expert in the field to fully understand assessment results. This is really true to…so don't be afraid to ask…that is the person's job!!!)

Remember these are your important types of validity

1. Inernal Validity- signifies the extent to which the conditions within a research design were conducive to drawing the conclusions the researcher was interested in drawing. In simplified terms it refers to whether or not alternative explanations of the experiment results might be discovered. It is a process of eliminating any other factors from causing the results discovered.

2. External validity is the extent to which the results of a study can be applied to circumstances outside the specific setting in which the research was carried out. In other words, it addresses the question "Can this research be applied to 'the real world'?" even more precisely put can it be generalized.

3. Construct Validity – whether an instrument (assessment) measures a singular unobservable social construct that it purports to measure. The unobservable idea of a unidimensional easier-to-harder dimension must be "constructed" in the words of human language and graphics. (you take a word and precisely define it in the context you have determined) A construct is not restricted to one set of observable indicators or attributes.

There are three main types of construct validity:

Convergent validity

Discriminate Validity

Nomological Validity

4. Face Validity - Does the test measure what it looks like it measures (this measure can be misleading because some of the best tests measure something that appears to be completely unrelated to what it looks like it is measuring ….i.e. True or False: I like popular science magazines is an MMPI-2 question that if answered in a particular way elevates your basic score levels for serious measures of mental disturbance.)

Even though these next ideas are ideas from the research genre they are really very important in the formation of an understanding of the results of a test given to students in an assessment. They are the same questions psychometrists ask themselves as they explain the results of a students evaluations especially discrpencies between the achievement and IQ tests. Ask yourself these same questions when you look at testing results:

1. Can you make any interesting predictions about the functional relationship between the two tests?

2. See if you can postulate any moderator variables that would weaken or maybe even reverse the relationship between your variables;

3. See if you can identify the mediating variables that account for the relationship (the cognitive or physiological variables that are the mechanism for the connection between your two variables).

How does this go together?

Now you need to know how researchers think about putting these tests together to give you the full understanding (yes you had to learn the terms first).

One of the hardest things in psychology is to establish construct validity. For example, how do you show that you really are measuring love? You start off by getting an operational definition--a recipe, a concrete set of steps or procedures that you will follow to get a score for each subject. Your operational definition should, at least, get you an objective measure. However, don't just assume that your measure is objective. You try to see if observer bias is a threat to your measure. If it is, you either modify your measure or you try to get objectivity by making your observers "blind."

Once you establish that your measure is not vulnerable to observer bias, you still can't say that your measure is perfect. One common first step is to see if your measure is reliable. That is, does it produce consistent scores? If you are measuring a construct that is stable over time (IQ), then subjects who take your test today and six months from now should get basically the same score each time. Test-retest reliability can tell us to what extent subjects are getting the same score each time. Typically, you expect a high test-retest reliability coefficient (between .90 and 1.0).

But what if you got a low test-retest reliability coefficient? For example, what if it was below .60? Then, your measure is being affected by something other than the stable construct.

What is this something else? Inconsistent, erratic, random error!

What do you do about this problem due to inconsistent, unstable random error?

1. Ditch the measure.

2. See if the random error is due to inconsistencies due to the observer.

Calculating inter observer reliability will tell you if this is a problem.

3. See if you can do anything to reduce any inconsistencies there are in

how the measure is administered. In technical terminology, try to

standardize the administration of your measure.

Reminders:

1. Random error and bias are both errors, but bias is much more serious threat to validity than random error.

2. Reliability and validity are two different concepts. A valid measure will be reliable, but a reliable measure is not necessarily valid.

Next, you get to worry about subject biases. Instead of giving you their true feelings, your participants may try to make themselves look good (social desirability) or make you look good (by following demand characteristics). Note that unobtrusive measures are one particularly clever way of reducing subject biases.

Now, you can almost start to make a case that your measure has construct validity . However, to do so, you may be called upon to show that:

Your measure has content validity: it has items that measure all the relevant dimensions of your construct and there are enough items for each dimension.

Your measure has internal consistency: all the items seem to be measuring the same thing. The evidence for this is that participants respond to all the items in a similar way. More specifically, participants who strongly agree with item 1, should also strongly agree with item 2, and item 3, etc.

Your measure has convergent validity: the measure correlates with other indicators of the construct. (If it walks like a duck and looks like a duck, it may be a duck).For example, people who score high on your measure should also score higher on other measures of the construct, high scorers should do more of the behaviors associated with your construct than low scorers, and people who are known to be high on your construct should score higher on your measure than people known to be low.

Your measure has discriminant validity. People who score high on your measure should not score high on measures of other constructs . (If it's supposed to be a duck, it shouldn't have ears like a rabbit or run on batteries like a machine.) Thus, people who score high on your IQ measure should not also score high on outgoingness, modesty, social desirability, moodiness, etc. If you can show that you're not measuring the wrong thing, it helps build the case that you may be measuring the right thing.

Now you know the ingredients for creating a great measure…or assessment. You are really getting smart and we will be leaving the field of Psychology for now and directing our focus towards the field of Children with Exceptionalities!

E-mail J'Anne Ellsworth at Janne.Ellsworth@nau.edu

Go Back to Overview or

Module 1 Coverpage

Course developed by Martha Affeld & J'Anne Affeld