TL;DR — For all of your guesses on a given section, you should pick the answer you've selected the least so far (Choose the Least Selected), if there is a tie you should choose the letter that appears later in the alphabet (D or E), and you probably shouldn't pick A.
When I started teaching the LSAT, more than 15 years ago, I was told that the likelihood of any given answer choice being correct should be roughly 20%. This wasn't surprising; if one answer choice were more likely than the others to be correct, then there would be a flaw in the design of the test.
I could verify the even distribution simply by counting the frequency of each answer choice on each test, or more thoroughly by averaging out the totals over several tests. The distribution was something like A = 20%, B = 19%, C = 20%, D = 21%, and E = 20%. I remember other LSAT instructors I knew at the time using this data to justify always choosing “D” as their guess...after all, it has a 1% greater chance of being right.
But, statistically speaking, one would have to guess on 100 questions in order to get 1 additional question correct using this strategy. "Always choose D" didn't strike me as the proper inference to draw from the data. The point is not that “D” has a slightly higher likelihood of being right, but that the answers choices are essentially evenly distributed, as they should be on a well-designed test.
So how can you use the knowledge that the LSAT answer choices are evenly distributed to your advantage? Well, when it comes time to guess in a section you'll likely have some information about that section—you've probably already answered between 15 and 24 questions. Assuming that you're mostly right about the answers you've selected, you'll have a pretty good idea about how many times each answer choice has been correct so far.
Hypothetically, if you've only selected “E” twice but you've selected each of the other answer choices four times, then (assuming you're answers are mostly correct) “E” would be more likely to show up as a correct answer on the questions that you haven't done yet. This also assumes that the answer choices are evenly distributed, not just over the whole test, but also within each section. As it turns out, they usually are. The distribution isn’t always exactly 5 As, 5 Bs, 5 Cs, 5 Ds, and 5 Es, but it's typically pretty close.
As a result, I've been advising students to pick the answer they've selected the least on each section when they are in guessing mode. I'll refer to this as the CLS (Choose the Least Selected) method. Though I've always thought that this was the best advice and the best guessing strategy, I’ve never had any concrete proof, only a theoretical justification...until now!
Thankfully, my coding skills are stronger than they used to be. Using the power (and simplicity) of the Ruby programming language and some good old-fashioned probability analysis, I've decided to finally prove that this guessing strategy is superior by running 498,000 guessing simulations and charting the data.
Experimental Design
I began by running simulations on both LR sections of all 83 PrepTests, replicating 10 possible levels of student accuracy, from 10% of the choices answered correctly all the way up to 100%. I also simulated 3 possible situations where the student would begin his/her guessing on the last 3, 6, or 9 questions of a given section. For example, the first group of simulations featured a student answering 10% of the questions correctly while guessing on the last 3 questions. Then simulating another student answering 20% of the questions correctly while guessing on the last 3 questions, etc. I continued with all the combinations of % correct and # remaining until I accounted for all possible scenarios.
Simulating different levels of accuracy is important, because this strategy (CLS) assumes that most of the questions the student has answered before he/she starts guessing have been answered correctly. If a student has bubbled 16 answers, but only 20% of them correct, he/she won't have an accurate idea of how many times each answer choice was in fact correct. But if a student has been 100% accurate on the same 16 questions, he/she would know with certainty how many times each correct answer choice has occurred. Recognizing this, I hypothesized that this guessing strategy would be more effective as the student's overall accuracy on previous questions increases.
In addition to simulating different levels of accuracy, I chose to simulate 3 possible “# of problems left situations”, where the student is guessing on 3, 6, or 9 questions. I hypothesized that the strategy would be more effective as the student gains more information about the answers choice that have been correct. In other words, if a student has already selected 22 answer choices (guessing on 3), then that student has more information than a student who has only selected 16 answer choices (guessing on 9).
Finally, I did 100 runs of each configuration because I wanted to average out the effect of selecting random wrong answers for “missed” questions. If I only did one simulated run of each configuration, and all of the random wrong answers for that configuration happened to be “A”, that would distort the data. So, for example, I simulated PT 1, Section 1, 10% accuracy, and 3 guesses 100 times. Then I simulated PT 1, Section 1, 20% accuracy, and 3 guesses 100 times. Then I simulated PT 1, Section 1, 30% accuracy, and 3 guesses 100 times...each different configuration was simulated 100 times.
For the graphs below, I focused on PrepTest 60 and up because I wanted to capture any recent trends. The outcome, however, remains the same across all 83 PrepTests. I also focused on accuracies of 30% or better, because 20% should be the worst possible outcome, where a student is essentially guessing on everything (So we don't really need to worry about what happens at 10% accuracy as this would be below the threshold of random guessing).
Here's how each strategy performs when a student guesses on the last 3 questions:
You'll notice that CLS is the most effective strategy for students who are correct more than 75% of the time. And for those who have achieved 100% accuracy on the questions they've attempted, CLS can help these students get 30% of their guesses correct.
I was also surprised by how well choosing “D” performed as well as how poorly choosing “A” did in these late guessing scenarios. You'll see later on that D and E are more common answer choices towards the end of most LR sections (see the arrows on the last graph in this blog post). I hypothesize that those questions are typically harder, and they often tempt students with wrong answers that appear before the correct answer.
Here's how each strategy performs when a student guesses on the last 6 questions:
These results make pretty good sense...with more guesses the outcome for picking any letter at random should converge to a 20% yield (you'll get 1 in 5 right). Always choosing A still looks like a bad idea, probably for the same reasons mentioned in the previous situation. Always choosing E results in a better outcome than always choosing D, mostly because E is a pretty common answer choice for questions 18-21 (see last graph) and for some reason D is very uncommon on questions 21 and 22. So always choosing D or always choosing E seem to be pretty effective strategies when guessing on the last 3 or 6 questions, but it might be difficult to decide between them. Choosing the Least Selected answer choice is significantly better than always choosing D here, and significantly better than always choosing E in the previous situation. So, in the interest of having a consistent strategy, it seems like the clear winner.
Here's how each strategy performs when a student guesses on the last 9 questions:
Now we can see even more convergence towards 20%, which makes sense because our guessing samples are bigger (9 questions).
A consistent theme across all 3 situations is that always choosing A is the worst strategy. There's no guarantee that this will be true going forward, but I would probably not pick A as my guessing answer choice for any unfinished questions. For students who are correct more than 70% of the time, CLS has the best average outcome. To understand why D and E were so popular in the first 2 situations, I plotted the frequency of each answer choice by question type:
Sure enough, D is really popular on questions 25 and 26 (black arrows). This explains why always picking D performs well when guessing on the last 3 questions. But D is unpopular on questions 21 and 22, while E is very popular on 20, 21, and 26 (it's correct more than 40% of the time on question 26). This explains why always choosing E performs well when guessing on the last 6 questions.
Finally, the trendlines show the increasing likelihood that E is correct and the decreasing likelihood that A is correct, as one progresses through the section. This further confirms that selecting all As for your guesses on a section is probably a bad idea.
Conclusion: For all of your guesses on a given section, you should pick the answer you've selected the least so far (CLS), if there is a tie you should choose the letter that appears later in the alphabet (D or E), and you probably shouldn't pick A. Don't dismiss C as a possible choice, though. It is a common answer choice on questions 24 and 25, it's just really rare as an answer to question 26.
If you like this data-driven approach to guessing, you should check out the rest of our course. We approach all aspects of LSAT preparation as rigorously as we approach our advice about guessing. You can see some sample lessons on our YouTube channel: www.youtube.com/lsatenginevideos
And if you have any questions or comments about this analysis, feel free to reach out to me at [email protected]
I hope this strategy helps you pick up an extra point or two on test day. Good luck!
Posted: 2-8-2018