EDR720
 StartSyllabusClassLibraryCommunicate
Help EDR720 : The Class : Survey Research : Survey Part II : Reading4-2-1

Electronic Textbook: Survey Research - Part II

"Ask and Ye Shall Receive:" Survey Research: Part Two: Selected Response and Response Choices

What is a Survey? American Statistical Association

We're in the middle of our 'survey trilogy,' research cyberfans! This time out, we'll take a look at some alternative types of SELECTED RESPONSE or FORCED- CHOICE questions. (For a more general discussion of how these differ from the other, open-ended type, please see the previous chapter, Survey Research, Part One. Also, if any of you would like copies of the related interviewing material from our qualitative modem course, and you don't already have it ... well, our current title says it all! "Ask and ye shall receive!")

These examples are from a rather unique book entitled, Scale Development: Theory and Applications by Robert F. DeVellis. It was published in 1991 by Sage Publications, Inc., of Thousand Oaks, CA and is #26 in Sage's excellent Applied Social Research Methods Series. The reason I refer to this book as "unique" is that it combines discussion of two related points that are seldom presented jointly. The book starts with a rather rigorous, comprehensive discussion of issues of validity and reliability of measurement procedures. Then the second half is taken up with survey development, design and refinement. I think it is commendable for DeVellis to "so suitably set the stage," couching survey item construction in the larger, more important, holistic issue of "what are we measuring? how dependably? how credibly?" It is tempting to 'tinker in isolation' with survey items and sometimes lose track of the 'bigger picture' of validity and reliability. (And by the way, we'll talk about pilot testing, cover letters, etc., next time around in our next topic)

Ready? Let's take the scenic view...!!!

I. Likert Scales

Let's start with this one, especially since it's one of the most popular forced-choice item formats! It's widely used in educational survey research, particularly in measuring attitudes, beliefs and opinions.

The basic idea here is to:

  1. write the item as a declarative sentence and;

  2. then provide a number of response options, or choices, that would indicate varying degrees of agreement with, or endorsement of, that sentence.

    Let's take a look now at a couple of examples:

Figure 1.

Examples of Likert-Scaled Survey Items

Please note, in the above two examples, that the "item" to be evaluated consists of a declarative sentence. Thus, it already states a 'position' and 'direction' of attitude. The respondent is then asked to circle the direction and extent (intensity) of his/her agreement (or disagreement) with that "position" sentence.

Occasionally, I'll see subjects construct the Likert scale items as questions: e.g., "Should combating drug abuse become a top national priority?" and then list the same choices. But - you can't really "agree" or "disagree" with a question! since it is in open-ended form! In order to gauge direction and intensity of agreement/disagreement, the subjects have to have 'something definite to react to;' i.e., a declarative sentence such as the two examples, above.

The next question that invariably arises is: how many response choices should be included? Related to this point is the sub-question: should there be a middle, neutral, or 'balancing' point?

  1. Usually 5 or 6 choices are the most commonly found!
  2. There are, unfortunately, no hard-and-fast rules about whether there "should" be an "odd" vs. "even" number of choices. The decision depends largely on what is being investigated, as well as extent of prior work in this area (what have other similar surveys measuring this phenomenon looked like? if there were Likert scaled items, how many choices were there?) This is but one of many issues that can be pretested and shared with a sample of pilot judges, as we'll see next time around.
  3. About the only "catch-all" rule of thumb is that, despite the ordinal nature of the response data (a blast from the past, Intro to Statistics fans! Remember scales of measure in our Module # 1?!), you should try to construct the response choices so as to have "roughly equal intervals with respect to agreement," according to DeVellis. We can see this in the 6 response choices for our 2 Likert examples above. The choices look about evenly spaced: strongly-moderately-mildly on both the agree and disagree sides.

Now -- back to the "odd" vs. "even" issue for a moment, as per #2 above. Having an "ODD number" of choices invariably leads us into the thorny issue of "should there be a middle point? and if so, what should be its exact meaning and nature?"

Let's tackle those in reverse order! and start with some "popular," plausible choices for how to label that middle point (response choice on our Likert scale), if we do choose to have one!

Such mid-point labels might include:

  1. Neutral;
  2. Neither agree nor disagree;
  3. Agree and disagree equally.

Question of semantics! Some would argue that the above labels are equivalent! However, DeVellis and others make a case that there could be a subtle but definite distinction in some readers' minds, particularly between b and c. According to DeVellis, "The first (b) implies apathetic disinterest, while the latter suggests strong but equal attraction to both agreement and disagreement." (pg. 69) He does have a point! I would put it this way: c (equally) appears to suggest 'active brainpower and thinking about it on the part of the subject,' while b (neither) might be chosen 'on a whim' by a subject who does not want to think deeply about the issue(s) at hand!

Now -- back to the 'stickier wicket' of whether there ought to even be a 'middle point' to begin with! As you've probably guessed, perhaps the main issue here is: "Is providing that middle point allowing for subjects to 'cop out?'when in fact we as researchers really need to know the direction and intensity of their attitudes?!"

This issue has troubled researchers, so much so that a large number of 'simulation studies' have been done. The results with and without a middle point have been compared to try and gauge its impact.

In general, two conclusions can be reached:

  1. As many as 20% of subjects will tend to choose the middle category when it is offered - although it has been found that these subjects would not volunteer it if it weren't mentioned; however,
  2. At the same time, this has at most a limited effect on the distribution of responses in the other categories (e.g., the ratios of those who would choose, say, "mildly" to "moderately" agree or disagree).
Let's look at the results of a simulation done by Schuman and Presser in 1981. They wrote the same question both ways, with and without the middle point. The relative response distributions (percentages) were as follows:

Should divorce in this
country be easier or more
difficult to obtain than
it is now?

Should divorce in this
country be easier to
obtain more difficult
to obtain, or stay as
it is now?
Easier 28.9
More difficult 44.5
Stay as is
(volunteered) 21.7
Don't know 4.9
Easier 22.7
More difficult 32.7
Stay as is 40.2
Don't know 4.3

From Converse and Presser (1986)

While the size of the middle category differs in the two alternative formats, Converse and Presser point out that the ratio of "easier" to "more difficult" is about the same. That is: not counting the middle and don't know responses, this ratio ("easier" to "more difficult") is about 40:60.

As Converse and Presser have concluded, "Offering a middle position makes less difference to individuals who feel strongly about an issue than it does to those who do not feel strongly." (pg. 37)

In our earlier discussion, we learned that Likert scaled items should consist of declarative sentences - ones that "take a stand," thereby giving the subjects "something to react to," i.e., agree or disagree with. A related question then arises: how "strong" should that statement be?

One might, for instance, classify the following Likert scale items as follows:

Wording "Strength" of Statement
"Physicians generally ignore what patients say." Strong
"Sometimes, physicians do not pay as much attention as they should to patients' comments." Moderate
"Once in a while, physicians might forget or miss something that a patient has told them." Weak

Here is the tradeoff, as DeVellis and other survey experts see it:

  1. With "too mild" a statement, there might be too much "yea-saying," surface agreement with that statement. Then, about the only thing you get is "bandwagoning" - but not a true gauge or differentiation of subjects' attitudes. Also, if such statements are "too mild," they run the risk of essentially washing out any attitude or opinion - being totally neutral in tone.
  2. On the other hand, too strong a statement runs the risk of offending, shocking or otherwise upsetting the subjects.

The experts advise that it's better to err in the direction of "too STRONG" than "too MILD." The best solution would be to help ensure - again, perhaps with explicit input from a pilot test panel - that your statements state directions and intensity of attitudes in clear, unambiguous terms.

Please click below for more information on Likert Scaling.

Likert Scales

II. Semantic Differential Scales

These center around a stimulus. This could be a person, thing, event, position, or whatever focused variable choose.

For the semantic differential you would:

  1. Identify the stimulus;
  2. Under it, list several pairs of adjectives.These would be written at opposite ends of a horizontal line - thereby representing the polar ends of a continuum.
  3. Subjects would then mark the line on each continuum that best depicts their opinion regarding this stimulus.

 

Let's now look at a specific example:

 

Figure 2.

Example of a Semantic Differential Scale

In the above example, automobile salespersons would be the target stimulus - the "thing" being evaluated according to the pairs of adjectives representing continuua of opinions. In reality, there would be more than two (2) such pairs of adjectives presented with the stimulus; the above is but an abbreviated example.

With regard to the question of how many response choices, or dashed lines on the continuum between the two pairs of adjectives: again, there are no hard and fast rules. Most often, in actual surveys one sees between 7 and 9 choices.

There are two ways to develop the adjective pairs:

  1. a) bipolar adjectives: these express the presence of opposite attributes; e.g., "friendly/hostile."
  2. b) unipolar adjectives: these would indicate the presence or absence of a single attribute; e.g., "friendly/not friendly."

As an endnote comment, you will sometimes see these referred to as "Osgood's Semantic Differential," since they originated with the seminal work of Osgood, Tannenbaum and other researchers in the 1950's.

III. Thurstone Scaling

For this type of survey item, your goal is to formulate a series of sequential statements about some target attribute. You would attempt to 'space' or scale those statements such that they they represent equal intervals of increasing or decreasing intensity of the attribute.

Subjects would only be given TWO CHOICES for each statement: "agree" and "disagree." The researcher then looks at which items 'triggered' agreement. Due to the equal interval desired phrasing of the original items, it is then hoped that the agreements reveal "how much of" that attribute the respondent has, or agrees with. This, then, would make the scores "behave" like interval data.

Again, an actual example might help clarify the above ideas! It's contained in Figure 3, below:

 

 

Figure 3.
An Example of a Thurstone Scale
(target attribute:
"measuring parents' aspirations for their
children's educational & career attainments")

As you can imagine, the process of writing 'successively equally spaced interval gradations-of-the-attitude' items can be subjective and tricky! For these reasons of reliability and validity, as well as other problems arising from their use and application, Thurstone items should in most cases not be the preferred method of choice.

Please look at the link below for more information on Thurstone Scaling.

Thurstone Scaling

V. Guttman Scales

As you'll see momentarily, these are highly similar to Thurstone scales. The Guttman scales, however, are designed to follow even a more closely "ordered, successively higher levels" of an attribute.

Please look very carefully at the difference in Items #3 and #4 for the Guttman scale (Figure 4, below), as compared with Items #3 and #4 for the Thurstone scale (Figure 3, above).

Figure 4.
An Example of a Guttman Scale
(target attribute:
"measuring parents' aspirations for
their children's educational & career attainments")

With the Thurstone scale, there was a 'switch in directionality,' while with the Guttman there is a progression in the same direction. Thus, for the Thurstone, we look for the occurrence of a single affirmative response, while in the Guttman we look for the point of transition from affirmative to negative responses.

One problem with the Guttman, in addition to the subjectivity issue raised with the Thurstone, is that it implies if you respond positively to a given item, you would also be assumed to respond positively to those items "below it in the hierarchy." Whether or not this 'order effect' will always be true depends on the phenomenon or attribute in question. For instance, in Figure 4 (Guttman example), a subject could conceivably agree with Item #3 but disagree with Item #4. This could be the case if, for example, the respondent perceived "success" as a rather complex & multi-faceted variable, one that could possibly both help and hinder happiness in various ways.

- - -

Please look at the link below for more information on Guttman Scaling.

Guttman Scaling

Now that we've taken a look at the various item formats for such forced-choice, closed-ended questions, let's end this lesson with a brief discussion that pertains to most of these. "How many response choices should I have per question?"

Issues Regarding "Optimal" Number of Response Choices

As indicated earlier, there is simply no 'single magic number' that represents the best overall total number of response choices. Instead, I would like to propose some issues or criteria that DeVellis and other survey researchers suggest be taken into consideration.

I. Variability/Covariability

This may be an important property to "pick up" when you are measuring, say, an attitude! Without 'enough' choices as to levels or intensity of, for example, 'agreement' or 'disagreement,' you may only be picking up a 'rough cut' of a person's true level of attitude. The extreme would be simply to offer two choices: "agree" or "disagree." This is a start but does not tell us "how much/to what extent" the subject agrees or disagrees. With a greater number of choices -- e.g., a 5-point Likert scale arrayed from "strongly disagree" to "strongly agree" -- there would be a 'finer partition' of such measurement. This, in turn, would give the survey researcher more specifically pinpointed information about the level of the person's attitude.

Even though the above criterion, variability, would imply that "more choices are preferred to less," this still depends on some other factors. For one thing: will only a single survey question be asked that deals with this topic? Or will there be a 'set' or 'range' of different survey items that try to 'tap' various aspects of the question?

With more discrete items, you are making 'multiple measurements' and therefore, you can 'get away with' having fewer response choices per item. On the other hand, with fewer (or, in the extreme, only a single) item per topic, you may need to offer more response choices to 'get at' the subject's level of attitude, opinion, etc.

An example might help to clarify this idea. DeVellis offers the following hypothetical survey scenario. Suppose that the researcher is studying the enforcement of non-smoking policies in a work setting. Suppose further that he/she wishes to determine the relationship between such policies and anger of the employees toward these policies.

Suppose too that the survey is limited to just the following two questions:

  1. "How much anger do you feel when you are restricted from smoking?" for smokers; and
  2. "How much anger do you feel when you are exposed to others smoking in the work place?" for non-smokers.

Given the limited number of questions in the first place, the researcher might obtain more useful and specific information regarding the subjects' attitudes by allowing for a greater "gradation" or variety of response levels - e.g., asking them to specify a value on a scale of 0 (not at all angry) to 100 (very angry).

On the other hand, suppose that the researcher had instead generated and pilot-tested a total of, say, fifty different Likert-scale items designed to tap different aspects about both the policies and the extent of the workers' emotions. These could run the gamut from types of policies; enforceability; sanctions; awareness of workers; extent of workers' input into formulating the eventual policies; etc. Given the larger number and range of items, tapping different facets of the underlying key variables, perhaps in this case even a simple "agree/disagree" would suffice. It would still be possible to add up the scores for the fifty items (say, scaling "disagree" as a 1 and "agree" as a 2) and create a "summated scale score:" total value for attitude across the 50 different 'facets' or items regarding attitudes/anger towards the smoking policies. This would "push" the score in the direction of "continuous" (many possible values) data - i.e., interval scaled.

Thus, the general "rule," for this criterion, at least, might be:

  1. Fewer items - > more response choices per item;
  2. More items - > fewer response choices per item.

II. Subjects' Ability to Discriminate Meaningfully

This one is more of an 'internal, perceptual' issue. How fine a distinction can subjects make IN THEIR OWN MINDS regarding the level of their attitude or opinion?

For example, a 50-level scale designed to differentiate and finely partition extent of agreement or disagreement may be too much for subjects to handle. But let's 'get real:' how many respondents can meaningfully distinguish their own levels of an attitude on this many scale points?! This may be a case of cognitive overload - and not very representative of how people actually think! Subjects may actually respond either by guessing (and more on that problem below) or by mentally rescaling the 50 choices to multiples of 5 or 10 only - in order to make the task more manageable for themselves.

Mathematically, while the variance/covariability may indeed INCREASE with this many choices (as per the first criterion, page 3), this may all be 'random noise' or 'error variance' if subjects end up 'just guessing' at their true level of response with too many choices!

III. Specific Wording and/or Placement of Items

These issues of survey construction may sometimes "spill over" and directly affect the subjects' ability to discriminate meaningfully between response choices. Thus, this factor can have an unintended "interaction effect" with our 2nd, preceding "discriminatory ability" factor (II, above).

  1. Terms like "Several" "Few" and "Many" can cause reliability problems - they are somewhat vague and your definition of "how many is 'several'" may differ from mine!
  2. Placement on the page may also subtly influence interpretation.

    Consider the array from left to right of the following choices:

    Many       Some       Few Very       Few       None

    This would 'spatially' lead the subjects to implictly assume that "few" is more than "some," simply due to the order effect (e.g., the way the eye naturally moves across the items to process them). Again, if this is your intent, this would be fine! This is just intended as a cautionary note as to how physical placement can affect cognitive interpretation!

  3. C. Worst case scenario: combining AMBIGUOUS TERMS with AMBIGUOUS PAGE PLACEMENT!

Consider the following:

Very Helpful
Somewhat helpful
Not very helpful
Not at all helpful

First of all, "somewhat" and "not very" are vague and rather confounded to begin with. But think about the potential for additional interpretive error introduced by the placement!

  1. a) If a subject reads down the first column and then down the 2nd, "somewhat" seems like it's a 'higher' value than "not very."
  2. b) BUT - if a subject instead reads across the first row and then the 2nd, the exact opposite is implied!

Clearly, lots of room for reliability and validity problems with the responses!

IV. Researcher's Ability and/or Willingness to Record a "Large Number" of Values for Each Item

DeVellis uses a 'thermometer' analogy to get us thinking about this key issue: Exactly how much precision is desirable? do we need? can we meaningfully process (statistically and otherwise)? This issue must be thought through in reference to the specific variables you are trying to measure via the survey questions and related responses.

V. Odd vs. Even Number of Responses/Middle Point

We have already discussed this one under Likert scales, pg. X, along with the concomitant decision of whether to include a 'middle' or 'balancing point,' as well as 'what its nature should be.'

- - -

I hope that this module has given you an overview of the 'finer points' of writing closed-ended survey items: the different types, as well as how many response choices should be provided!

There are a number of outstanding survey construction books on the market. I'd like to recommend a few of these to you in case you'd like to pursue additional reading in this important measurement area!

Several summers ago, when I was asked to teach the Survey Design course, I chose the following two books as our "readers" for the course. Let me share with you what they are, as well as what I particularly like about them:

  1. How to Conduct Surveys: A Step by Step Guide, (http://www.businesssavvy.com/101259n.html) by Arlene Fink and Jacqueline Kosecoff. 1985: Sage Publications, Inc. This one continues to win rave reviews from our doctoral colleagues! For one thing, it is chock-full of lots of practical examples of "how to take so-so-written survey items of all sorts and rewrite them to make them better!" In fact, I also 'pulled a fast one' and used this book in my 'live' EDR 610, Intro to Research, classes. The reason is that it nicely treats survey construction in the overall perspective of the entire research process. Thus, you also get good, solid, clearly written discussion on population and sampling - which makes sense in terms of WHO will be filling out the surveys! as well as other aspects of the research process. Thus, I (and our doctoral colleagues, according to favorable 'word of mouth!') consider this to be a classic overall research primer, as well as a survey construction and refinement primer! It is on reserve in Cline Library under: EDR 798, Dissertation Seminar, Packard - and "much" checked out!

  2. Survey Research Methods, 2nd ed. by Floyd J. Fowler, Jr. Volume 1: Applied Social Research Methods Series, 1993, Sage Publications, Inc. Although a bit 'more advanced' in tone than the Fink & Kosecoff book, also an incredibly readable, clear book on any/all aspects of survey construction. Contains a good chart on 'relative tradeoffs' of telephone vs. mail vs. personal surveys (hint, hint, dissertation fans: grist for the mill for your Chapter 1 Limitations!). We have the 1st edition on reserve in Cline Library as well.

  3. Survey Questions: Handcrafting the Standardized Questionnaire, by Jean M. Converse and Stanley Presser. No. 63 in the Quantitative Applications in the Social Sciences series: A Sage University Paper, 1986. A 'condensed, handbook-type' (80-pg. small green paperback) of the "how to tinker with survey items and rewrite them" that is also contained in Fink and Kosecoff. Doesn't have the "holistic, surrounding discussion of other parts of the research process," but for what it does, it does magnificently! That is: zeroing in on 'tinkering with' survey items and improving them. I can't recall offhand if this one is on reserve in Cline or not. I believe the handbooks in this series cost something like $ 7.50-$ 8.00 apiece (plus shipping) if bought directly from Sage. Well worth having as a handy reference if you're planning to go the closed-ended survey question route!

- - -

Questionnaire Survey Bibliography

Next time, we'll close out our "survey trilogy" by looking at issues of pilot testing, validity and reliability. For those contemplating the dissemination of mailed surveys (perhaps the most popular channel of distribution), we'll also talk about the elements of a 'good' cover letter to accompany your mailed survey.

Continued wishes for a hefty return rate to all of our
budding survey researchers ... !!!



Once you have finished you should:

Go on to Assignment 1
or
Go back to Survey Research Part II: Ask and You Shall Receive: Types of Response

E-mail M. Dereshiwsky at statcatmd@aol.com
Call M. Dereshiwsky at (520) 523-1892


NAU

Copyright © 1999 Northern Arizona University
ALL RIGHTS RESERVED