Probability and Baseball
Before the popularization of sabermetrics in the late 1970's, a good number of baseball statisticians and historians believed that Ty Cobb was the greatest player who ever lived. Winning twelve batting titles in a thirteen year stretch, Cobb compiled a lifetime batting average (BA) of .366 (revised from .367 a few years ago). This figure, the highest in history, was one reason why Cobb was so revered as a player and was enshrined as a charter member of Baseball's Hall of Fame.
We recall that BA is nothing more than the decimal expansion, carried out to the nearest thousandth, of the ratio of hits (H) to at-bats (AB). That is,
For Cobb, H = 4189, AB = 11434; therefore, BA = .366. Let's look at this statistic.
This ratio can be considered as a probability; more precisely, as an experimental or empirical probability. It is a number which has been obtained by observing a certain number of trials or events. Based on these trials, one feels comfortable in saying that "Cobb was a .366 hitter" or that "Cobb hit safely 36.6% of the times he was charged with an at-bat". We denote this "probability of getting a H" as P(H) = .366.
Mathematicians look at probabilities as numbers between 0 and 1, inclusive. If an event has a probability of 0, then it is called an impossible event. For example, if one rolled a pair of fair dice, the probability of rolling a "13" would be 0, since it is impossible to roll any number greater than 12. On the other extreme, the probability of rolling "a number less than 177" is 1, because all rolls of fair dice sum to numbers less than 177 (or 176 or any other whole number greater than 12); such events are called certain events.
Let's get back to BA. What if Cobb (or anybody else for that matter) batted twice in a row? What is the probability that he would have gotten two hits? This question has a lot of meat to it!
First off, for the sake of simplicity, let us assume that we have a .300 hitter, instead of a .366 hitter like Cobb. That is, P(H) = .300. For a "long enough" career, we can also assume that, given one or two or three additional AB, the hitter's BA remains constant. In other words, if a player has 3000 H in 10000 AB, the outcome of the 10001stAB (whether he gets a hit or an out) will have a negligible effect on his BA. The question arises, given two consecutive AB, what is the probability of getting two hits in a row?
To answer this question, we must posit one more thing, and this is a key assumption. We will assume that the two AB are independent of each other. That is, the outcome of the first AB has no influence or effect on the outcome of the second AB. While this assumption is not unreasonable, it may, in part, be difficult to justify, for a number of reasons. These factors may include past histories of both the batter and the pitcher, the score of the game, the base-out situation, etc. However, the fact remains that it is virtually impossible to supply a simple mathematical model or equation to reflect anything but independence. Therefore, acknowledging certain minor flaws, we will assume independence in order to answer our question.
Mathematically, we use a very simple calculation. The laws of probability dictate that when two, or more, events are independent, then one merely multiplies the individual probabilities. So, in our case, if a hitter has a probability .300 (30%) of getting a H in one AB, then the probability of getting two H in two AB is given by:
Note that this means that a .300 hitter will get two H in two AB less than 10% of the time. Or put another way, were the 10000 AB of this hitter expressed sequentially (H versus out (O)), we would have two consecutive H less than 10% of the time! For example, let us say that the first 30 AB in the player's career produced the following sequence of H and O:
HOOOO HHOOO OOHOH OHHOO OOHOO OOOOH.
Were we to extend this for all 10000 AB, there is a very good chance that a subsequence of HH in two AB would occur only about 9% of the time. By the way, this can easily be verified by use of simulation, random numbers and an Excel spreadsheet.
Continuing with this idea, the reader can clearly see that the probability of a .300 hitter getting 3 H in 3 AB is , or less than 3% of the time. The same hitter has a probability of .0081 for obtaining 4 H in 4 AB, or less than 1 % of the time.
For a .400 hitter, P(2H) = .160, P(3H) = .064 and P(4H) = .0256.
Bottom Line: While probabilities are extremely useful, and many baseball measures are essentially probabilities (slugging percentage, on base average, homerun percentage, etc.), one readily perceives how "anti-intuitive" these numbers can be. This gives us food for thought when we talk about "how probable" a desired outcome may be.