Helmert and



Keywords: helmert and
Description: Contrasts in R determine how linear model coefficients of categorical variables are interpreted. The default contrast for unordered categorical variables is the Treatment contrast. This means the

Contrasts in R determine how linear model coefficients of categorical variables are interpreted. The default contrast for unordered categorical variables is the Treatment contrast. This means the “first” level (aka, the baseline) is rolled into the intercept and all subsequent levels have a coefficient that represents their difference from the baseline. That’s not too hard to grasp. But what about other contrasts, namely the Helmert and Sum contrasts? What do they do? Instead of explaining them, I figured I would demonstrate each.

Our data consist of three levels of arbitrary values. "flevels" is our categorical variable. Notice I explicitly defined it to be a factor using the factor() function. I need to do this so R knows this variable is a factor and codes it according to whatever contrast setting we decide to use.

This is a 3 x 2 matrix. The 2 columns of the matrix tells us that our model will have 2 coefficients, one for the B level and one for the C level. Therefore the A level is the baseline. The coefficients we get in our linear model for B and C will indicate the respective differences of their mean from the level A mean. The values in the rows tell us what values to plug into the model to get the means for the row labels. For example, to get the mean for A we plug in 0's for both coefficients which leaves us with the intercept. Therefore the intercept is the mean of A. Let's see all this in action before we explore the Helmert and Sum contrasts.

Now we can verify how the Treatment contrast works by extracting the coefficient values from the model and comparing to the means we calculated earlier:

Let's also verify that plugging in the row values of the contrast matrix returns the means of each level:

So that's how Treatment contrasts work. Now let's look at Helmert contrasts. "The coefficients for the Helmert regressors compare each level with the average of the "preceding" ones", says Fox in his book An R and S-Plus Companion to Applied Regression . I guess that makes sense. Kind of. Eh, not really. At least not to me. I say we do as we did before: fit a model and compare the coefficients to the means and see what they mean. Before we do that we need to set the contrast to Helmert:

Interesting. Notice the column labels are no longer associated with the levels of the factor. They just say 1 and 2. However this still tells us that our model will have two coefficients. Again the row values tell us what to plug in to get the means of A, B and C, respectively. To get the mean of A, we plug in -1 and -1 to the model. This means our intercept has a different interpretation. Let's fit the linear model and investigate.

It turns out the intercept is the mean of the means, the first coefficient is the mean of the first two levels minus the first level, and the second coefficient is the mean of all three levels minus the mean of the first two levels. Did you get that? Here, this may help:

Let's do that thing again where we plug in the row values of the contrast matrix to verify it returns the means of the levels:

That leaves us with the Sum contrast. Regarding models fitted with the Sum contrasts, Fox tells us that "each coefficient compares the corresponding level of the factor to the average of the other levels." I think like Helmert contrasts, this one is better demonstrated. As before we need to change the contrast setting.

Just like the Helmert contrast we see two columns with no labels. Our model will have two coefficients that don't correspond directly to the levels of our factors. By now we know the values in the rows are what we plug into our model to get the means of our levels. To get the mean of level A, we plug in 1 and 0. Time to fit the model and investigate:

Like the Helmert contrasts, our intercept is mean of all means. But our two coefficients have different interpretations. The first is the mean of all means minus the mean of level 1 (A). The second coefficient is the mean of all means minus the mean of level 2 (B). Notice in the model output above that the second coefficient is not significant. In other words, the mean of level B is not significantly different from the mean of all means.

Finally to be complete we plug in the row values of the Sum contrast matrix to verify it returns the means of the factor levels:

And finally we wrap up this exercise by returning the contrast level of our categorical variable back to the system default:






Photogallery Helmert and:


A simple proof of the solutions of the Helmert- and the ...


Topographic effects by the StokesHelmert method of geoid and ...


Orbit fitting based on Helmert transformation - Springer


Precise geoid determination over Sweden using the StokesHelmert ...


Far-Zone Contributions to Topographical Effects in the Stokes ...


Robust estimations for the nonlinear Gauss Helmert model by the ...


On the indirect effect in the StokesHelmert method of geoid ...


Determination of the boundary values for the StokesHelmert ...


The direct topographical correction in gravimetric geoid ...


Weak Nonlinearity in a Model Which Arises from the Helmert ...


On some problems of the downward continuation of the 5'5' mean ...


Higher-degree reference field in the generalized Stokes-Helmert ...


Two different views of topographical and downward-continuation ...


The non-linear 2D symmetric helmert transformation : An exact non ...


On least-squares solution to 3D similarity transformation problem ...


The effect of EGM2008-based normal, normal-orthometric and Helmert ...


Reproducing Estimators via Least-Squares: An Optimal Alternative ...


Friedrich Robert Helmert - Springer