library(grf)
grf computes average treatments effects based on a double robust correction (giving rise to an augmented inverse-propensity weighted average treatment effect). For a causal forest with binary treatment the exact expression is (equation (8) of Athey and Wager, 2019):
\(\hat \Gamma_i = \hat \tau^{(-i)}(X_i) + \frac{W_i - \hat e^{(-i)}(X_i)}{\hat e^{(-i)}(X_i)[1 - \hat e^{(-i)}(X_i)]} [Y_i - \hat \mu^{(-i)}(X_i, W_i)]\)
where:
\(\hat \tau^{(-i)}(X_i)\) is the treatment effect estimate (returned by grf’s predict
method). Using the potential outcomes notation this is by definition \(E[Y_i(1) - Y_i(0) | X_i] = \mu(X_i, 0) - \mu(X_i, 1)\).
\(W_i\) is the binary treatment indicator for subject \(i\).
\(\hat e^{(-i)}(X_i)\) is the propensity score for subject \(i\) (this is by default estimated by a regression forest on the treatment assignment).
\(Y_i\) is the realized outcome for subject \(i\).
\(\hat \mu^{(-i)}(X_i, W_i) = \hat m^{(-i)}(X_i) + [W_i - \hat e^{(-i)}(X_i)] \hat \tau^{(-i)}(X_i)\) is an estimate of the realized conditional mean for subject \(i\), where \(m(X_i) = E[Y | X_i]\) (this is by default estimated using a regression forest, marginalizing over treatment).
The superscript \(-i\) indicates cross-fitting, i.e. that the estimate is computed by leaving observation \(i\) out. This holds by construction for out-of-bag forest estimates.
To see how the last expression arises, note that we have:
\(m(X_i)\)
\(= E[Y | X_i]\)
\(= E[Y_i(0) | X_i] + E[W_i [Y_i(1) - Y_i(0)] | X_i]\)
\(= \mu(X_i, 0) + e(X_i) \tau(X_i)\)
where there last line is due to unconfoundedness (conditioning on the set of covariates \(X_i\), the potential outcomes are independent of the treatment).
We then obtain the following expressions for the counterfactual response surfaces:
\(\mu(X_i, 0) = m(X_i) - e(X_i) \tau(X_i)\)
\(\mu(X_i, 1) = \tau(X_i) + \mu(X_i, 0) = m(X_i) + [1 - e(X_i)] \tau(X_i)\)
These objects are all computed by the built-in ATE functions. The following code snippet illustrates how to manually obtain estimates of the conditional means \(E[Y | X_i, W_i]\):
n <- 250 p <- 5 X <- matrix(rnorm(n * p), n, p) W <- rbinom(n, 1, 0.5) Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n) # These are estimates of m(X) = E[Y | X] forest.Y <- regression_forest(X, Y) Y.hat <- predict(forest.Y)$predictions # These are estimates of the propensity score E[W | X] forest.W <- regression_forest(X, W) W.hat <- predict(forest.W)$predictions c.forest <- causal_forest(X, Y, W, Y.hat, W.hat) tau.hat <- predict(c.forest)$predictions # E[Y | X, W = 0] mu.hat.0 <- Y.hat - W.hat * tau.hat # E[Y | X, W = 1] mu.hat.1 <- Y.hat + (1 - W.hat) * tau.hat
Athey, Susan and Stefan Wager. Estimating Treatment Effects with Causal Forests: An Application. Observational Studies, 5, 2019. (arxiv)