National Football League (NFL) playbooks are the size of telephone books. They’re filled with dozens and dozens of plays, each designed so that a team can play to its strengths while taking advantage of its opponents’ weaknesses. Despite the endless variations, they all basically boil down to two options for the offense: pass or run. No matter how intricately designed an offensive play is, if the defense can sniff out whether the ball will be tossed down field or toted along the ground, it gains a tremendous advantage. (Yes, we know that teams punt and kick field goals and extra points after touchdowns. But we’re not talking about that right now.)
Earlier this week, a pair of statisticians from North Carolina State University showed off a model they built that predicts whether a specific team will call a passing or running play with a high degree of accuracy. They presented the model in Seattle at JSM 2015, a joint conference of statistical and mathematical societies.
William Burton, an undergraduate who is majoring in industrial engineering and minoring in statistics, and Michael Dickey, who graduated in May with a degree in statistics, used a listing of actual NFL offensive plays from the 2000 through 2014 seasons that had been compiled by a company called Armchair Analysis to figure out the ratio of passes to runs. They showed empirically what fans already understood anecdotally: the aerial attack is being utilized ever more frequently. Pass plays were called on 56.7 percent of the time in 2014, compared with 54.4 percent in 2000.
But what makes a team decide whether to run or throw? Burton and Dickey looked at a host of factors that affect a team's play selection. Among these are: the distance to the first-down marker, whether it’s first, second, third or fourth down, how much time is left on the game clock, the team’s score in relation to its opponent’s, and field position. For example, there’s a high probability that the coach will opt for a passing play if the other team is leading by three points, there’s a minute left in the fourth quarter, the offense is facing third down at its own 30-yard line, and needs to advance 7 yards to pick up a fresh set of downs. On the other hand, a team that’s leading by 7 points, facing the same down and distance at the same point in the game, might very likely run the ball (to avoid an interception and to take time off the clock so the other team can’t mount a score-tying drive before time runs out).
For their system, Burton and Dickey developed logistic regression models—methods used to, for example, predict if someone will default on a mortgage—and random forest models—a machine learning method. But they quickly realized that teams’ strategies differ significantly in each of a game’s quarters. To account for that, they produced six separate logistic regression models: one each for the first, second, and third quarters, plus one for the fourth quarter if the offensive team is winning, another if it is losing, and a third for when the score is tied. They tested their models on 20 randomly selected games. Overall, the models accurately predicted pass or run on 75 percent of downs. The models’ best performance was related to a 2014 game between the Jacksonville Jaguars and Dallas Cowboys. Their predictions proved correct on 109 out of 119 offensive plays—a 91.6-percent accuracy rate.
Burton and Dickey say that anyone, including NFL coaches and fans rooting for their teams at home, can use the tool to make educated guesses about what will happen each time the ball is snapped.