Schedules of Reinforcement: Control the Fabric of Time and Space!
Alright, so we know what the two kinds of reinforcement are (positive and negative), now we get to dive into something you may not find in many dog training books: schedules of reinforcement.
Schedules of reinforcement are the times where you are applying reinforcers. The timing and frequency that you give a reinforcer can be just as important as what the reinforcer is. The schedule you reinforce at will also change the nature of the behavior you will get. These schedules will help you understand where you should be changing your reinforcement policies, and when you should and should not reinforce a behavior.
There are two kinds of schedules that exist, continuous reinforcement and partial reinforcement.
Continuous Reinforcement is when you provide a reinforcer every single time the behavior occurs. Your dog sat on cue? Treat. He sat on cue again? Treat.
You’re usually going to use this in the beginning just so your subject understands the connection between desired behavior and the reinforcer.
The problem with this schedule is that extinction happens fairly quickly to behaviors that are on a continuous reinforcement schedule, ie. it’s very easy to break. The moment you stop reinforcing, the behavior will disappear fairly quickly because the subject understands that it will not get any more reinforcers from you for that behavior. However, be aware that there is still a likelihood of an extinction burst when you stop reinforcing.
So how do you put durability into a behavior? Well, figurative person, I’m glad you asked!
Partial Reinforcement is when you randomly reward a behavior; you’re not giving a reward every single time the behavior you want happens. By keeping your subject guessing when their next action will be rewarded with a reinforcer, you are building durability into the behavior. Think of these schedules as different versions of a gambling addiction; you don’t know which lever pull or gamble will get you money, but you know one of them eventually will.
Based on the intervals between rewards, partial reinforcement schedules are broken down into 4 categories.
- Fixed Ratio Schedules
- Variable Ratio Schedules
- Fixed Interval Schedules
- Variable Interval Schedules
I’ll explain in more detail about each of these schedules in their own articles, but for now, I’ll go over some general points about each one.
The term “fixed” in this scenario means that the number of responses/amount of time to get the next reinforcer will always be consistent.
In comparison, “variable” means the number of responses/amount of time to get the next reinforcer is completely random.
“Ratio” means the reinforcer is contingent on the number of responses, while “interval” means the reinforcer is contingent on the amount of time that has elapsed.
As a result, a Fixed Ratio Schedule means that the subject only has to hit a fixed number of responses before they receive their reinforcer.
A Variable Ratio Schedule means the number of responses required to get a reinforcer is completely random.
Fixed Interval Schedules will have a set amount of time between each window of opportunity to be reinforced. When these windows open, all the subject has to do is respond in that time to get their reinforcer.
Variable Interval Schedules in comparison will have the window open at random times. Again, all the subject has to do is respond whenever this window opens in order to be reinforced.
But what does this look like when you’ve applied it to a subject? What will their behaviors look like? Well, to help you separate each schedule, let’s look at an example.
Let’s say we’re trying to see what happens when you apply these schedules to a person pushing a button. Whenever this person pushes the button, they have a chance at getting a dollar. The person doesn’t know when they will get their dollar.
So how does each schedule affect the button pushing behavior? Well, let’s say we ran this experiment, got our data, and graphed it. Here is what our subject’s response rates would look like.
Kinda wacky right? Let’s dig in and figure out what these lines mean.
The Y axis represents how often they push the button, and the X axis represents the amount of elapsed time. Each tick mark represents a time where the subject was reinforced.
Variable Ratio (Red line) creates a sharp consistently rising line because the subject isn’t sure how many pushes will eventually result in another reinforcement. For example, it could be any number of responses between 1 and 25. As a result, our subject will start mashing on the button as often as they can because they don’t know which button press will be the one that drops a dollar.
Variable Interval (Yellow Line) creates a consistently rising line that will not be as sharp as Variable Ratio, but the response rate will look similar because the subject isn’t sure when the next window of reinforcement will open. They won’t push the button as frantically as Variable Ratio, but the response rate will remain consistent because the person needs to test to see if the window is open for them to get their dollar. As far as the subject knows, that window could open at almost any time. Time wise, the window of reinforcement could open up anywhere between 1 second to 3 minutes after the last reinforcer was given.
Fixed Ratio (Green Line) has a more jagged, but steadily rising line because as the subject gets closer to the reinforcement point, their response rate will increase. Our subject has a general idea when the reinforcement point will be (for this example, we can say about 21-25 responses), so their response rate will spike until they get their reinforcer. After that, they take a break because they know it will take another 21-25 button pushes to get their next reinforcer.
Fixed Interval (Purple Line) has the lowest rate of response because the rate of response will only increase when the window of reinforcement is close to opening. All the subject has to do is wait every few minutes and push the button to see if the window for reinforcement is open yet. However, this is the lowest rate of response because the subject will know approximately what time the window will open and only push the button close that time. In this example, we can say the window opened 3-4 minutes.
Both fixed schedules will create plateaus in their response rates. The reason for this is that your subject will develop a sense of when they are about to be reinforced. As a result, their responses will spike in order to get their reinforcer. Once they have been reinforced, most subjects will take a break before continuing their responses, causing the plateaus.
Schedules of Reinforcement seem like an intimidating subject at first, but once you start to understand how each one can be used, it’s going to be one of the most useful tools you will have. This goes double for identifying which schedule your own behaviors are on and which schedules you want to use in the future.
In the future articles relating to schedules of reinforcement, I’ll be diving into each individual schedule in detail, so you can better understand what each schedule will look like when it’s being used.
[…] amount of time it takes to extinguish a behavior is going to be based upon what schedule of reinforcement was keeping the behavior going. This will be discussed in another article, but be aware that […]
[…] over again from our previous article on Schedules of Reinforcement, the word “Fixed” in this scenario means that the behavior is rewarded after a set amount of […]
[…] to the other Schedules of Reinforcement, Variable Ratio tends to fit the bill for the kinds of behavior you want to […]
[…] that we’ve gone over most of the different kinds of schedules of reinforcement, it’s time for the last, but not least of the schedules: Variable […]
[…] you’ll recall in schedules of reinforcement, inconsistency can strengthen a […]
[…] than the target behavior (in this case, jumping on people). Be aware, doing this will fit into a continuous reinforcement schedule and should be treated as […]