### Acceleration due to drag and PitchF/X (giant, massive brain dump part I)

I've been playing around with corrections to the pitchF/X data recently. I was kind of saving these results to present at a proposed pitchf/x workshop that was originally slated to be in a couple of weeks, but plans for that fell through. The workshop still may happen, but it will be later in the baseball season if it does. So I figure I'll go ahead and dump what I have so far here, and hopefully when/if the workshop happens, I'll have something more to present.

Anyway, as the title proclaims, this post is all about acceleration, it's measurement by the pitchF/X system, and park by park corrections to the values of acceleration.

So, if we want to check whether or not any corrections we apply are valid, the absolute best thing to do is to try to measure some physical quantity that we know. With acceleration, primarily with the z component of acceleration, it would be super nice if we could use the pitchf/x data to measure the acceleration of a baseball due to gravity, g. Unfortunately, there just isn't enough information in the data to make a measurement of g. The baseball experiences acceleration due to at least 3 separate phenomena; drag, magnus force, and gravity, and without the prior knowledge of other parameters, it is impossible to separate g out from the other components. In all prior analyses of pitchf/x data I can find on the web, g is simply treated as a constant that is the same everywhere. It's possible that this may not be such a good idea...

So I'm going to try to 'measure' the next best physical constant in the pitchf/x data, C

In the above mentioned paper, the X and Z components of acceleration due to drag are estimated as:

a

a

Where < v

a

and

This is not an entirely correct assumption, as the magnus force will contribute a tiny amount to the acceleration in y, but we are going to ignore that for now. (I may come back to this at some point, as it should be fairly easy to calculate this bit, but for the purposes of what I am going to show, it won't really make a lot of difference).

Now, in reality, the acceleration due to drag on a baseball takes the form:

Where the bold bits are vectors and the not bolded bits are scalars. K is a numerical factor that works out to be about 0.00544 at sea level. The 'right' way for me to determine C

C

I should note that before calculating C

So then for each pitch, I can calculate an effective C

I picked these parks for a reason: Anaheim seems to be close to league average w.r.t acceleration. Comerica however appears to have extremely low values of C

Now, I'd like to apply some corrections and see if these plots come to an agreement. This is the part of the post that I have to mark with a great big "PRELIMINARY", because I suspect that there could be some bugs in my correction code. I get numbers for the corrections that are reasonable I think, but I'm not 100% confident in them yet. Anyway, the corrections I choose to apply are essentially the correction algorithm developed by Josh Kalk, and explained here. Josh has essentially developed a method for applying the simplest possible correction to each of the 8 pitchF/X variables based on the assumption that statistically, pitchers aren't very different from one appearance to another. I won't make any judgement on the validity of that assumption. It's probably a decent one to make, but but in a limited data set (we basically have 1/2 of an MLB season, more for some parks, less for others), this could be subject to some wierd effects. Either way, currently, his approach is the best one around. And judging by his player cards, these corrections do a pretty good job in getting a pitchers release point nailed down. So there are some reasons to expect them to work. Now, I mentioned that Josh applies the simplest possible correction. What do I mean by that. Josh essentially calculates a number that for each variable for each pitch in a given park (after translating down sea level) gets added (actually subtracted due to the way he calculates the number) to the pitch variables. One number per variable per park (29 parks, 8 variables = 232 correction factors). In principle, knowing nothing about how the data are taken, correction factors could reasonably be a function of all 8 variables. The more complicated we allow our correction factors to be, the more difficult it is to determine them. Thus Josh's method is a very good start, as we don't want to further complicate our lives if we don't have to.

So the question I have is "do the additive correction factors work as well on acceleration as they do on release point?" My initial gut feeling is that there may be some problem with them, primarily because I wonder if the corrections would be different if we split the data by pitcher handedness (primarily in the 'x' direction), and that the difference between correction factors for righties and lefties may also carry a sign change. I worry though that actually applying such a split may, especially for lefties, leave us with too little data to really say anything concrete. Although this is what I feel might be the case, I'm not convinced of it yet.

Anyway, so I picked the parks I showed above because in my running of Josh's algorithm, PETCO and Comerica had very large correction factors for a

Woah. That was totally unexpected. Even though I thought there might be differences between the corrected acceleration for these three parks, I did not at all expect it to be this obvious. What this appears to have done is to take the value for each park at 85 mph or so and adjusted things so that at 85 mph, the parks are in agreement. But away from 85 the parks start to diverge. Note that Anaheim looks almost as it did before. This was expected because the measured correction values for Anaheim are actually rather small. But the other two...they had large correction factors applied. And it looks like whats happening is that for velocities below 85 mph, we are actually overcorrecting the acceleration, and possibly undercorrecting at velocities greater than 85. At Comerica, we had to move all values up by some amount, but below 85 mph, it looks as though we should be moving them up by some smaller amount than we actually did. At PETCO, we have the opposite. We moved the slower pitches too far down in acceleration.

So then, to my mind, additive corrections are not the way to go for acceleration. Possibly also for velocity. So I have a couple of ideas on how to fix this. Basically, we will try the two next simplest forms of correction factors. First, a multiplicative correction factor for acceleration, and then if we need to perhaps a combination of multiplicative and additive correction factors.

I have a feeling that I know what could be causing this kind of behavior...However, I'm not going to jump on that possibility without more information. Because if I am wrong about it, it will mean I'd do a whole lot of work for nothing.

So, Next time, I'll show you what happens if I apply multiplicative correction factors. I haven't done it yet, so I have no idea if thats going to be the fix for this. I hope it will be...but I'm a little leery of that. If it doesn't work, then it will be on to the combination of multiplicative and additive corrections. If those don't work, then we've got something a whole lot more complex than I can deal with without knowing a great deal more specifics about the pitchf/x setups at each individual park.

I mentioned before that just assuming gravity is a constant 32 fps at the beginning may be a bad thing to do. Well, if indeed we need some complex correction factors to acceleration, it may be that the cause of such a correction would inherently screw up a hypothetical measurement of the trajectory of a baseball in a vacuum. (In other words, we might be mis-measuring g!). If so, it could be that any multiplicative or otherwise complex correction factors we have to apply to bring all parks into agreement with one another might have to be applied also to the value of g that is assumed...This could get complicated.

footnote: I have no idea what the real, physical shape of C

Anyway, as the title proclaims, this post is all about acceleration, it's measurement by the pitchF/X system, and park by park corrections to the values of acceleration.

So, if we want to check whether or not any corrections we apply are valid, the absolute best thing to do is to try to measure some physical quantity that we know. With acceleration, primarily with the z component of acceleration, it would be super nice if we could use the pitchf/x data to measure the acceleration of a baseball due to gravity, g. Unfortunately, there just isn't enough information in the data to make a measurement of g. The baseball experiences acceleration due to at least 3 separate phenomena; drag, magnus force, and gravity, and without the prior knowledge of other parameters, it is impossible to separate g out from the other components. In all prior analyses of pitchf/x data I can find on the web, g is simply treated as a constant that is the same everywhere. It's possible that this may not be such a good idea...

So I'm going to try to 'measure' the next best physical constant in the pitchf/x data, C

_{D}, or the drag coefficient (see footnote). To do this, I am going to borrow a few formulae from Dr. Alan Nathan. In a recent paper posted to his website, he posed a different way of separating the magnus force from the drag force in the data. All previous analyses had simply treated the drag force as acting entirely in the y direction. But we know that unless the pitch only moves in the y direction that this is not true. Alan had formulated a time averaged approximation of drag in each direction in an attempt to produce values for 'pitch movement' that were dependent only on the magnus force, and by the plots in his paper, it certainly seems that he was successful in doing so. If you are interested, you can read his full paper here.In the above mentioned paper, the X and Z components of acceleration due to drag are estimated as:

a

_{Dx}= a_{y}*< v_{x}>/< v_{y}>a

_{Dz}= a_{y}*< v_{z}>/< v_{y}>Where < v

_{x,y,z}> denotes the time average of velocity in the appropriate direction. We also continue to assume that all acceleration in the y direction is due to drag, so that:a

_{Dy}= a_{y}and

**a**=a_{D}_{y}*(< v_{x}>/< v_{y}>, 1, < v_{z}>/< v_{y}>)This is not an entirely correct assumption, as the magnus force will contribute a tiny amount to the acceleration in y, but we are going to ignore that for now. (I may come back to this at some point, as it should be fairly easy to calculate this bit, but for the purposes of what I am going to show, it won't really make a lot of difference).

Now, in reality, the acceleration due to drag on a baseball takes the form:

**a**= - KC_{D}_{D}v**v**Where the bold bits are vectors and the not bolded bits are scalars. K is a numerical factor that works out to be about 0.00544 at sea level. The 'right' way for me to determine C

_{D}would be for me to take the entire trajectory that the pitchf/x data describe, fit it to a model of Drag+Magnus+gravity, letting only C_{D}and C_{L}(the lift coefficient in the magnus force) float. However, this is a more complicated process than I am currently set up to do, mainly because the aerodynamic forces acting on the baseball are non-linear, and I don't have fitting code for such a fit readily available...it would take more time. So I make an approximation. For my purposes, I define the drag coefficient thusly:C

_{D}= sqrt(a_{Dx}^{2}+a_{Dy}^{2}+a_{Dz}^{2})/(K*(< v_{x}>^{2}+< v_{y}>^{2}+< v_{z}>^{2}))I should note that before calculating C

_{D}in this manner, I apply a correction to accelerations to essentially bring all games to sea level and standard (59 F) temperature. I do this simply by calculating the ratio of the air density at the ballpark at game time to air density at standard temperature and pressure (and in the case of Minnesota, the ballpark pressure is calculated as being 1 inch water gauge over ambient pressure as measured by weather stations outside the ballpark. This is according to Steve Maki, Director of Facilities and Engineering at the Metrodome.) I then take this ratio of station density to sea level density and multiply all accelerations by it's inverse to obtain a standardized acceleration for each pitch. In the case of a_{z}, I first subtract off the value for gravity before applying this correction (and as I mentioned before, there might be a problem with doing this).So then for each pitch, I can calculate an effective C

_{D}at sea level. I can then plot C_{D}versus velocity for each park. I will illustrate 3 parks below, and later I should have all parks in an album here much like I did for the strikezone post I made a few months ago. In the plots below, the scatter plot is of the actual data of C_{D}versus < v >. I use the time averaged velocity rather than initial velocity as the time averaged velocity is used to estimate the net drag in the first place. The second plot is simply a profile plot of the first. In each bin in velocity, the per bin mean is calculated and it's plotted as a point in the profile plot. I do this just to get an idea of the shape of the C_{D}distribution.I picked these parks for a reason: Anaheim seems to be close to league average w.r.t acceleration. Comerica however appears to have extremely low values of C

_{D}, while the values at PETCO seem to be pretty high. However, the shape of these distributions seem to be similar, with a minimum around 85 mph. The shapes aren't quite the same, but they are close enough. Another thing I notice is that the spread in values for C_{D}are much higher at PETCO than they are at the other ballplarks.Now, I'd like to apply some corrections and see if these plots come to an agreement. This is the part of the post that I have to mark with a great big "PRELIMINARY", because I suspect that there could be some bugs in my correction code. I get numbers for the corrections that are reasonable I think, but I'm not 100% confident in them yet. Anyway, the corrections I choose to apply are essentially the correction algorithm developed by Josh Kalk, and explained here. Josh has essentially developed a method for applying the simplest possible correction to each of the 8 pitchF/X variables based on the assumption that statistically, pitchers aren't very different from one appearance to another. I won't make any judgement on the validity of that assumption. It's probably a decent one to make, but but in a limited data set (we basically have 1/2 of an MLB season, more for some parks, less for others), this could be subject to some wierd effects. Either way, currently, his approach is the best one around. And judging by his player cards, these corrections do a pretty good job in getting a pitchers release point nailed down. So there are some reasons to expect them to work. Now, I mentioned that Josh applies the simplest possible correction. What do I mean by that. Josh essentially calculates a number that for each variable for each pitch in a given park (after translating down sea level) gets added (actually subtracted due to the way he calculates the number) to the pitch variables. One number per variable per park (29 parks, 8 variables = 232 correction factors). In principle, knowing nothing about how the data are taken, correction factors could reasonably be a function of all 8 variables. The more complicated we allow our correction factors to be, the more difficult it is to determine them. Thus Josh's method is a very good start, as we don't want to further complicate our lives if we don't have to.

So the question I have is "do the additive correction factors work as well on acceleration as they do on release point?" My initial gut feeling is that there may be some problem with them, primarily because I wonder if the corrections would be different if we split the data by pitcher handedness (primarily in the 'x' direction), and that the difference between correction factors for righties and lefties may also carry a sign change. I worry though that actually applying such a split may, especially for lefties, leave us with too little data to really say anything concrete. Although this is what I feel might be the case, I'm not convinced of it yet.

Anyway, so I picked the parks I showed above because in my running of Josh's algorithm, PETCO and Comerica had very large correction factors for a

_{y}, and Anaheim had only small corrections. Looking at the plots above, we can see that these large corrections are justified. If I were to plot just a_{D}instead, you would see that the differences are very much in line with the correction factors spit out by Josh's correction algorithm. So if I apply Josh's corrections, and additive corrections are indeed all we need, we should see fairly good agreement between these three parks, right? Well, lets see:Woah. That was totally unexpected. Even though I thought there might be differences between the corrected acceleration for these three parks, I did not at all expect it to be this obvious. What this appears to have done is to take the value for each park at 85 mph or so and adjusted things so that at 85 mph, the parks are in agreement. But away from 85 the parks start to diverge. Note that Anaheim looks almost as it did before. This was expected because the measured correction values for Anaheim are actually rather small. But the other two...they had large correction factors applied. And it looks like whats happening is that for velocities below 85 mph, we are actually overcorrecting the acceleration, and possibly undercorrecting at velocities greater than 85. At Comerica, we had to move all values up by some amount, but below 85 mph, it looks as though we should be moving them up by some smaller amount than we actually did. At PETCO, we have the opposite. We moved the slower pitches too far down in acceleration.

So then, to my mind, additive corrections are not the way to go for acceleration. Possibly also for velocity. So I have a couple of ideas on how to fix this. Basically, we will try the two next simplest forms of correction factors. First, a multiplicative correction factor for acceleration, and then if we need to perhaps a combination of multiplicative and additive correction factors.

I have a feeling that I know what could be causing this kind of behavior...However, I'm not going to jump on that possibility without more information. Because if I am wrong about it, it will mean I'd do a whole lot of work for nothing.

So, Next time, I'll show you what happens if I apply multiplicative correction factors. I haven't done it yet, so I have no idea if thats going to be the fix for this. I hope it will be...but I'm a little leery of that. If it doesn't work, then it will be on to the combination of multiplicative and additive corrections. If those don't work, then we've got something a whole lot more complex than I can deal with without knowing a great deal more specifics about the pitchf/x setups at each individual park.

I mentioned before that just assuming gravity is a constant 32 fps at the beginning may be a bad thing to do. Well, if indeed we need some complex correction factors to acceleration, it may be that the cause of such a correction would inherently screw up a hypothetical measurement of the trajectory of a baseball in a vacuum. (In other words, we might be mis-measuring g!). If so, it could be that any multiplicative or otherwise complex correction factors we have to apply to bring all parks into agreement with one another might have to be applied also to the value of g that is assumed...This could get complicated.

footnote: I have no idea what the real, physical shape of C

_{D}versus velocity should look like for a baseball. Apparently, nobody else really knows for sure either. Some suspect a "drag crisis" or a large minimum that may occur at around 80 mph, and at least in the uncorrected plots, this seems to be born out (potentially), although I would argue that it's more of a drag dip than a crisis. Still, I'm not convinced yet that it's right. And even if I find some better correction factors, I still won't be sure that such a curve is right. All I can be sure of if is that the data from one park to another will be in agreement with each other. The baseball guy in me feels like that is enough. The physicist in me though is a little disappointed.
mike_fast## Baseball is hard

frogcastle