Bobs roommate, Greg, is a complete slob. The condition of his own room is Gregs business, but the bathroom they share is another matter. Almost every week they have major arguments about cleaning the bathroom and Bob ends up doing the cleaning. Finally the unfairness of the situation is more than Bob can stand, so he tries a new plan. Whenever Greg does anything to help clean the apartment, Bob praises him: "Good job, Greg; the apartment really looks great." Gradually Greg begins helping on a more regular basis, and one week he even offers to clean the bathroom.
What technique did Bob use to get Greg to help clean the apartment?
We will now discuss the second basic type of learning, operant conditioning. In operant conditioning, also known as instrumental conditioning, an organism operates on its environment to produce a change. In other words, the organisms behavior is instrumental in producing a change in the environment. As you will see, the type of change that is produced is a critical element of this type of learning.
To return to our question, what technique did Bob use to get Greg to help clean the apartment? The answer is that he reinforced this behavior by praising it.
In some cases, operant behavior may result in the delivery of a stimulus or an event. When you put coins into a soda machine, for example, you receive a cold drink. When Greg cleaned the apartment, he received praise. In other instances, the operant behavior may result in the elimination of a stimulus or an event. For example, you have probably learned the quickest way to eliminate the sound your alarm clock makes early in the morning. These events are known as reinforcers, and they are at the heart of operant conditioning. We can define a reinforcer as an event or stimulus that makes the behavior it follows more likely to occur again (Skinner, 1938). For example, obtaining a cold drink from the soda machine and eliminating the annoying sound of your alarm clock are reinforcers. The behavior that a reinforcer follows can be thought of as the target responseit is the behavior that we want to strengthen or increase.
Primary and Secondary Reinforcers.
A primary reinforcer is a stimulus or an event that has innate reinforcing properties; you do not have to learn that such stimuli are reinforcers. For a hungry person, food is a primary reinforcer. A secondary reinforcer is a stimulus that acquires reinforcing properties by being associated with a primary reinforcer; you must learn that such stimuli are reinforcers. Money is a good example of a secondary reinforcer. By itself, money has no intrinsic value; children must learn that money can be exchanged for primary reinforcers such as ice cream and toys.
Positive and Negative Reinforcers.
Primary and secondary reinforcers may be either positive or negative (Skinner, 1938). As we discuss positive and negative reinforcers, you should remember that regardless of whether a reinforcer is positive or negative, it always makes the target response more likely to occur again.
Positive reinforcers are events or stimuli such as food, water, money, and praise that are presented after the target response occurs. For example, a real estate agent earns a commission for each house she sells; the commissions reinforce her efforts to sell as many houses as possible. Your little brother is allowed to watch cartoons on Saturday mornings after he has cleaned his room; as a result, he cleans his room every Saturday. You have been praised for receiving good grades on psychology tests; the praise should encourage you to study even harder.
Negative reinforcers are events or stimuli that are removed because a response has occurred. Examples of negative reinforcement include taking an aspirin to eliminate a headache, playing music to reduce boredom, and cleaning your room so that your roommate will stop telling you that youre a slob. In these situations, something stopped or was removed (headache, boredom, criticism) because you performed a target response. What response will occur the next time these unpleasant situations arise? If the negative reinforcer has been effective, the target response that terminated it is likely to occur again.
Probably no one has been more closely associated with operant conditioning than the late Harvard psychologist B. F. Skinner (1904-1990). Skinner was strongly influenced by John B. Watsons behavioral view of psychology (see Chapter 1). As we have seen, Watson believed that if we could understand how to predict and control behavior, we would know all there was to know about psychology. Skinner therefore began to look for the stimuli that control behavior. The effects of reinforcement quickly impressed him as very important, and he devoted himself to studying reinforcement and its influence on behavior. To isolate those effects, he developed a special testing environment; he called it an operant conditioning chamber, but it is usually referred to as a Skinner box. .
In an operant conditioning chamber, the experimenter can present reinforcers, such as a piece of food for a hungry rat or pigeon, according to a preset pattern. The preset pattern or plan for delivering reinforcement is known as a schedule of reinforcement (Skinner, 1938). Schedules of reinforcement are important determinants of behavior, and we will say more about them shortly. But before we can impose a schedule of reinforcement, our participant or animal must be able to perform the target response, which is achieved through the process known as shaping.
When you start training a rat in a Skinner box, you should not expect too much. The rat will not begin pressing the lever or bar as soon as it enters this new environment. You may have to help it learn to press the lever or bar to receive food. The technique you will use is called shaping. Shaping involves reinforcing successive responses that more closely resemble the desired target response; in other words, you are using the method of successive approximations.
For a rat learning to press a lever for food, the sequence of events might go as follows: When the rat goes near the food dish, you drop a piece of food into it. Eating the food reinforces the behavior of approaching the dish. Once the rat has learned where the food is, you begin offering reinforcers only when the rat goes near the response lever. Gradually you make your response requirements more demanding until the rat must actually touch the lever to receive the reinforcement. Once the rat has started touching the lever, you can require that the lever be pressed before the reinforcement is given. In this way you have gradually made the response that produces reinforcement more closely resemble the target response of pressing the lever. In short, you have shaped the rats response.
Once a target response has been shaped, the experimenter can arrange to have the reinforcer delivered according to a specific schedule (Ferster & Skinner, 1957).
As noted earlier, a schedule of reinforcement is a preset pattern or plan for delivering reinforcement. The most basic schedule of reinforcement is continuous reinforcement, in which the participant is given a reinforcement after each target response. For example, a rat in a Skinner box receives a piece of food for each bar press; a salesperson receives a commission for each car sold; a soda machine delivers a cold drink each time you put coins in it. A continuous schedule of reinforcement produces a reasonably high rate of responding. However, once the reinforcer loses its effectiveness, the response rate drops quickly. Thus food pellets reinforce responding in a hungry rat, but they are not effective after the rat has eaten a large number of them.
Intermittent or Partial Reinforcement. In schedules of reinforcement that do not employ continuous reinforcement, some responses are not reinforced. Because reinforcement does not follow each response, the term intermittent or partial reinforcement is used to describe these patterns of delivering reinforcement. There are two main types of intermittent schedules, ratio and interval.
When a ratio schedule is in effect, the number of responses determines whether the participant receives reinforcement. In some cases, the exact number of responses that must make to receive reinforcement is specified. For example, a pigeon may be required to peck a key five times before grain is presented (a positive reinforcer). When the number of responses required to produce reinforcement is specified, the arrangement is known as a fixed-ratio (FR) schedule. Requiring a pigeon to peck five times to receive reinforcement is designated as a "fixed-ratio 5" schedule (FR5). A continuous reinforcement schedule can be thought of as a "fixed-ratio 1" schedule (FR1).
On other occasions we may not want to specify the exact number of responses. Sometimes the reinforcer will be delivered after 15 responses, sometimes after 35 responses, and sometimes after 10 responses, and so forth. Because the exact number of responses required for reinforcement is not specified, this arrangement is called a variable-ratio (VR) schedule. Typically the average number of responses is used to indicate the type of variable-ratio schedule. In our example, in which the values 15, 35, and 10 were used, the average number of responses would be 20 [(15 + 35 + 10)/3 = 20)]. This particular schedule would be designated as a "variable-ratio 20" (VR20) schedule.
Whether we are dealing with a VR or FR schedule, our participants usually make many responses. Frequent responding makes good sense in these situations: The more responses made, the more frequently the participant receives a reinforcer. Although both FR and VR schedules produce many responses, VR schedules produce somewhat higher rates of responding. These differences are shown in (Fixed Ratio, Variable Ratio).
When an FR schedule is in effect, the participant may pause for a brief period after the reinforcement has been delivered. This postreinforcement pause typically does not occur when a VR schedule is used. When an FR schedule is used, the reinforcer seems to serve as a signal to take a short break. If you were responding on an FR schedule, you might be thinking, "Five more responses until I get the reinforcer, then Ill rest for a bit before I start responding again."
Suppose that you have a job stuffing envelopes. For every 200 envelopes you stuff, you receive $10 (a positive reinforcer). To earn as much money as possible, you work very hard and stuff as many envelopes as possible. However, every time the 200th envelope is completed, you stop for a minute to straighten the stack and count how many piles of 200 envelopes you have completed.
The duration of the post-reinforcement pause is not the same for all FR schedules; the higher the schedule (the greater the number of responses required to produce a reinforcer), the longer the pause (Todd & Cogan, 1978). In addition, the more time expended in responding, the longer the post-reinforcement pause will be (Collier, Hirsch, & Hamlin, 1972).
Now consider a case in which there is no post-reinforcement pause. Last year, Virginia and some of her friends spent their spring break in Las Vegas. The slot machines proved to be Virginias downfall. Sometimes the jackpot bell rang and she collected a potful of quarters, which encouraged her to continue playing. Before she knew it, she had been putting quarters into the "one-armed bandit" for 6 hours straight. When she counted up her winnings and losses, she had spent over $250 just to win $33.50. Why did Virginia put so much money into the slot machine?
The answer is that slot machines "pay off" on a VR schedule. As Virginia put quarter after quarter into the machine, she was probably thinking, "Next time the bell will ring, and Ill get the jackpot." She knew that she would receive a reward (hitting the jackpot) at some point, but because slot machines operate on a variable schedule, she could not predict when she would be rewarded. If you have ever become "hooked" on playing the lottery, you can understand this process.
The second type of intermittent schedule of reinforcement, the interval schedule, involves the passage of time. When an interval schedule is in effect, responses are reinforced only after a certain interval of time has passed. As with ratio schedules, there are two types of interval schedules, fixed-interval and variable-interval.
Under a fixed-interval (FI) schedule, a constant period of time must pass before a response is reinforced. Responses made before the end of that period are not reinforced. No matter how many times you check your mailbox, you will not receive mail until it is time for the daily mail delivery. Under an FI schedule, participants try to estimate the passage of time and make most of their responses toward the end of the interval, when they will be reinforced. The longer participants stay on an FI schedule, the better they become at timing their responses (Cruser & Klein, 1984).
When reinforcement occurs on a variable-interval (VI) schedule, the participant never knows the exact length of time that must pass before a response is reinforced; the time interval changes after every reinforcement. Hence it makes sense for the participant to maintain a steady, but not especially high, rate of responding. A response could be reinforced at any time. Think of the times you have called a friend on the phone only to get a busy signal. You probably did not start redialing at a frantic pace. Most likely you called back a few minutes later, then a few minutes after that if you were not successful, and so on. You could not determine whether your friend was having a short or a lengthy conversation; that is, you did not know when your dialing would be reinforced by the sound of a ringing telephone. Only time would tell. At some point your friend hung up, and you were able to get through. As time passed, your chances of getting through got better and better.
The average amount of time that must elapse before a response produces reinforcement under a VI schedule influences the rate of responding; the longer the interval, the lower the rate of responding. For example, pigeons reinforced on a VI 2-minute schedule responded between 60 and 100 times per minute, whereas pigeons reinforced on a VI 7.1-minute schedule responded 20 to 70 times per minute (Catania & Reynolds, 1968).
In our study of classical conditioning, we encountered contingency theory and blocking. These processes indicate that classical conditioning is not simply mechanical; rather, mental activity or thought processes (cognition) are involved to some degree. Cognition also plays a role in operant conditioning.
The importance of cognition to operant conditioning can be seen in the process known as insight learning. Insight learning is a form of operant conditioning in which we restructure our perceptual stimuli (we see things in a different way) and generalize to other situations. In short, it is not blind, trial-and-error learning that develops gradually but rather a type of learning that occurs suddenly and relies on cognitive processes. It is the "aha!" experience we have when we suddenly solve a problem.
Classic research by the Gestalt psychologist Wolfgang Kohler (1927) exemplifies insight learning. Using chimpanzees as his test animals, Kohler gave them the following problem. A bunch of bananas was suspended out of reach of the apes. To reach the bananas, the apes had to stack three boxes on top of one another and then put together the pieces of a jointed pole to form a single, longer pole. After several attempts at jumping and trying to reach the bananas, Kohlers star pupil, Sultan, appeared to survey the situation (mentally rearrange the stimulus elements that were present) and solve the problem in the prescribed manner. Kohler felt that Sultan had achieved insight into the correct solution of the problem. Consider the solution of a particularly difficult mathematics problem. You struggle and struggle and struggle, but nothing happens. In frustration you set the problem aside and turn to another assignment. All of a sudden you know how to work the math problem; youve had an "aha!" experience. How you perceive the situation has changed; insight has occurred. Once this problem has been solved, you are able to solve others like it. Yes, cognitive processes are important in helping us adapt to our environment. As we will see, other organisms, even rats, may use cognitive processes as they go about their daily activities.
In his study of maze learning by rats, psychologist Edward C. Tolman, of the University of California, Berkeley, presented very persuasive evidence for the use of cognitive processes in basic learning (Tolman & Honzik, 1930). Tolman is most often associated with his study of latent learning. Latent learning occurs when learning has taken place but is not demonstrated. In one of his most famous studies, three groups of rats learned a complex maze that had many choices and dead ends. One group of rats was always reinforced with food for successfully completing the maze. These animals gradually made fewer and fewer errors until after 11 days of training, their performance was nearly perfect. A second group was never reinforced; the rats continued to make numerous errors. The third (latent-learning) group of animals did not receive reinforcement for the first 10 days of training. On the eleventh day, reinforcement was provided. The behavior of these animals on the twelfth day is of crucial importance. If learning occurs in a gradual, trial-and-error manner, their performance on the twelfth day should not have differed drastically from their performance on the eleventh day. If, however, the rats used cognitive processes to learn to navigate the maze, they would exhibit more dramatic behavior changes. In fact, on the twelfth day, these rats solved the maze as quickly as the rats who had been continually reinforced. How did these rats learn so quickly? Tolman argued that by wandering through the maze for 10 days before the introduction of reinforcement, these animals had formed a cognitive map of the maze. In other words, they had learned to solve the maze, but this knowledge had remained latent (unused) until reinforcement was introduced on the eleventh day. Then, on the twelfth day, these rats demonstrated that they knew how to get to the location of the reinforcement. Their latent learning had manifested itself.
Serial Enumeration. Serial enumeration refers to the ability to remember a series of events and to respond appropriately the next time that series of events is encountered. Serial enumeration seems to involve the use of cognitive processes.
Richard Burns and Walter Gordon (1988) have shown that rats are capable of serial enumeration. They trained rats to run a straight runway from the start area to the goal area, and they recorded the rats speed. They reasoned that if the rats were given a different reinforcer each time they ran down a runway, the animals would be able to keep track of which trial they were running by the type of reinforcement they received. They trained the rats over a period of several days, always following a fixed pattern in which the rats received different types of reinforcers (such as rat pellets or breakfast cereal) after some trials and no reinforcement after others. After experiencing this pattern of reward and nonreward for several days, the animals should have been able to predict when a nonreward trial was going to occur by the type of reinforcer they received on the preceding trial. The results showed that this was indeed the case. Just as if a sign saying "No Food" had been posted in the runway, the rats used cognitive processes to learn to run more slowly on the trials that would not be rewarded. Thus cognitive processes appear to play a major role, even in the acquisition of basic learning phenomena (see Macuda & Roberts, 1995; Wilkie & Williams, 1995).
We have seen that the effect of a reinforcer (either positive or negative) is to increase the likelihood of a target response. A punisher has the opposite effect: to decrease the likelihood or rate of responding of a target response.
Everyone seems to have an opinion about the usefulness of punishment. The typical view is that punishment does not work very well. This philosophy seems to have originated with the educator E. L. Thorndike. In the early 1900s, Thorndike developed a very influential theory of learning. One of the main components of that theory was the law of effect (Thorndike, 1911), which stated that presenting a "satisfier" (a reinforcer) leads to the strengthening or learning of new responses, whereas presenting an "annoyer" (a punisher) leads to the weakening or unlearning of responses. We will explore different types of punishers as well as some guidelines for using punishment effectively.
Just as there are positive and negative reinforcers, there are positive and negative punishers. This might sound just like the description of reinforcement. Remember, reinforcement (positive or negative) increases the rate of responding, whereas punishment (positive or negative) decreases the rate of responding. For example, if a rat in an operant conditioning chamber receives a mild electric shock for pressing a lever, its rate of responding decreases. Similarly, if a child is scolded for playing in the street, that behavior occurs less often. These are examples of positive punishers. Examples of negative punishers include taking away a childs allowance, grounding a teenager, or suspending a basketball player for violating training rules.
When punishment is used properly, it can help eliminate undesirable behaviors. How and when should punishment be used? Azrin and Holz (1966) and Axelrod and Apsche (1983) have suggested several procedures that should be followed if punishment is to be used effectively:
Clearly, it is very difficult to use punishment effectively. Perhaps the best solution would be to reinforce an alternate desired behavior such as politeness.
Remember Virginia, who put all those quarters into the slot machine? Is the machine really going to pay off with a big jackpot this time, or is the jackpot switch broken? Because she does not know, Virginia continues to put coins into the machine, always hoping for a big payoff. She spent her entire bankroll of $250 just to win $33.50. Other people have been known to spend much more money playing slot machines.
Why is it so difficult to stop playing a slot machine once you have started?
In this section we examine the process by which learned responses are weakened and become less likely to occur. We also see how operant behavior can come under the control of certain stimuli. We begin with what is known as the partial reinforcement effect.
The Partial Reinforcement Effect
Every day you look in your mailbox for a letter from a friend. After 10 months of looking, you are finally convinced that your friend is not going to write; no letters have come, and there is no reason to expect any. Completely removing the reinforcerin this case, your friends lettersfrom the operant conditioning situation eventually results in extinction or elimination of the operantly conditioned response. There are some basic similarities between the way extinction is produced in classical conditioning (by omitting the US) and the way it is produced in operant conditioning (by removing the reinforcer).
So why is it so difficult to stop playing a slot machine once you have started?
As you saw earlier in this chapter, intermittent or partial reinforcement schedules can produce very high rates of responding. This is especially true of ratio schedules, in which the harder the participants work, the more reinforcement they receive. Because partial reinforcement schedules involve making a number of responses that are not reinforced, it may be difficult to tell when reinforcement has been discontinued completely and when it has merely been delayed. Because Virginia cannot tell whether the slot machine is broken or whether it will pay off the next time she puts in a quarter, she continues to play. Many players stop only when all their money is gone.
We hope you are beginning to see a general pattern concerning extinction and operant conditioning. If reinforcement is delivered in a predictable manner, it should be easier to tell when it has been discontinued and extinction has begun. Hence extinction should occur more rapidly following FR training than following VR training. Likewise, extinction should occur more rapidly following FI training than following VI training. Moreover, it should be even easier to extinguish responding that has been conditioned through the use of continuous reinforcement than responding that has been conditioned through any partial or intermittent schedule.
All of these facts have been verified experimentally. This general pattern, termed the partial reinforcement effect, is well established (Amsel, 1962). Briefly, the partial reinforcement effect states that extinction of operant behavior is more difficult following partial or intermittent reinforcement than following continuous reinforcement. The same pattern of results has been shown for extinction following classical conditioning (Humphreys, 1939). Continuous reinforcement participants who received a puff of air (the US) on each of 96 training trials extinguished their conditioned eye-blink response more rapidly than partial reinforcement participants who received the US on only 48 of the 96 trials.
Operant Conditioning and Stimulus Control
Bringing a behavior under stimulus control means that a particular stimulus or signal tells the participant that its responses will be reinforced (Fetterman, 1993). In an operant conditioning chamber, for example, a green light or a tone can be a signal to a rat that pressing the lever will be reinforced. Such a signal is called a discriminative stimulus. When the light or tone is present, lever presses are reinforced under the schedule of reinforcement that the rat has experienced during training. When the discriminative stimulus is absent, the responses are not reinforced, and extinction occurs.
A vast number of discriminative stimuli are found in the real world. The "Open" sign in a store window is a discriminative stimulus signaling that the response of reaching for the door handle will be reinforced by your being able to enter the store and shop. The color of the traffic light at an intersection signals that the response of stopping your car (red) or proceeding through the intersection (green) will be reinforced by safe arrival at your destination. Your friends mood serves as a signal that a response such as telling a joke or making a sympathetic remark will be appreciated.
In discussing classical conditioning, we saw that conditioning may become generalized; that is, a CR may occur in response to stimuli that are similar to the one used in training. Generalization also occurs in operant conditioning. In some instances this generalization may be helpful. For example, in some towns the traffic light hangs over the center of the street, whereas in others it is at the curb. Moreover, the colors of the lights are not identical; some greens are bright, while others are dull. Yet in all cases we respond appropriately; we have generalized the appropriate responses that we made to similar stimuli in the past. We stop when we see a red light, whether bright or dull, overhead or at the curb, and we go when we see a green light.
Generalization is not always desirable, however. Imagine a young child who runs to meet relatives whenever they come to visit. Would you want that child to run to adult strangers as well? In general, children must learn which behaviors are appropriate in different situations. They must learn, for example, that behaviors appropriate for a sports event are not appropriate for a wedding, even though both situations involve crowds of people. This discriminative training is accomplished by reinforcing responses only when the precise discriminative stimulus is present. Other responses, especially those made in the presence of similar but incorrect stimuli, are not reinforced; in other words, they are extinguished. Gradually we learn to respond only when the precise discriminative stimulus is present. If you stop and analyze what we have said about discrimination and generalization in operant conditioning, you will see that they are opposing processes, just as they are in classical conditioning.
You have given permission for your young son and daughter to participate in a psychological experiment at the local university. During the experiment, each child watches an adult play with a large inflatable doll that can double as a punching bag. Because the doll has sand in the base, when it is punched it bounces back and is ready for more punches. The adult gives the doll a merciless beating. Then each child is given an opportunity to play with the doll.
What can this experiment tell us about learning?
For many years, psychologists believed that a participant must actually perform an operant response for learning to occur. In the early 1960s, Albert Bandura and his colleagues changed this view (Bandura, Ross, & Ross, 1963). You will recall from Chapter 1 that they found that children who observed an adult hitting and punching an inflatable Bobo doll were likely to repeat those behaviors when they were given a chance to play with the doll. Control participants, who had not observed the adult model, behaved less aggressively. Because the children made no responses while they were watching, the researchers concluded that simply observing the behavior and reinforcement (or punishment) of another participant could result in learning (Bandura, 1977). Such learning is termed observational learning. Because the observation of other people is a central factor in this form of learning, this approach is often called social learning theory.
The key to observational learning appears to be that the participant identifies with the person being observed. If we put ourselves in the other persons place for a moment, we are better able to imagine the effects of the reinforcer or punisher. This phenomenon is called vicarious reinforcement (or vicarious punishment).
Observational learning, or modeling, as it is sometimes called, appears to be a widespread phenomenon. It is even found among a number of animals. Rats that observed the extinction behavior of other rats subsequently stopped responding more rapidly than rats that did not observe extinction performance (Heyes, Jaldow, & Dawson, 1993). In one experiment, monkeys reared in a laboratory didnt fear snakes. However, after watching another group of monkeys react fearfully to snakes, the nonfearful monkeys also developed a pronounced fear of snakes (Cook et al., 1985).
Attempts to influence behavior through observational learning occur every day. Turn on the television and you are bombarded with commercials, which are nothing more than a form of observational learning. If you drive this kind of car, wear these clothes, use this brand of perfume, shower with this soap, use this shampoo, and eat this kind of breakfast, you will be rich, famous, powerful, sexy, and so forth, just like the models in the commercials.
According to the social learning theory proposed by Bandura (1986), for observational learning to be effective, the following conditions must be present:
The knowledge that children model the behaviors of adults has led to concern about the possible effects of filmed and televised violence on children. Many people fear that youngsters who witness violent acts on television and in films may repeat those acts in real life. Edward Donnerstein (1995), a psychologist at the University of California, Santa Barbara, indicates that "there is absolutely no doubt that higher levels of viewing violence in the mass media are correlated with increased acceptance of aggressive attitudes and aggressive behavior. This exposure in young children can have lifelong consequences." We have more to say about the possible effects of televised violence in Chapter 16.
To end our discussion of observational learning on a more positive note, we should point out that this technique has been used to teach desired behaviors. For example, young children who have been taught to chew their food carefully, find swallowing a large medicine capsule whole can be a major obstacle. The observational learning technique, in which another person modeled the correct procedure for swallowing a capsule, was used to teach children how to swallow such pills (Blount et al., 1984).
Your friend John is a junk-food addict. He just cannot pass up those extra cookies, a handful of potato chips, or the candy dish. His bad habits are catching up with him. For the third time in the past five years his clothes are embarrassingly tight, his cholesterol levels are dangerously high, and his health is generally poor. John is well aware of the negative effects of the junk-food; he just cannot stop his poor eating habits.
Can anything be done to change Johns behavior?
ęCopyright 1997 by Prentice-Hall, Inc.