Title: | Double Machine Learning in R |
---|---|
Description: | Implementation of the double/debiased machine learning framework of Chernozhukov et al. (2018) <doi:10.1111/ectj.12097> for partially linear regression models, partially linear instrumental variable regression models, interactive regression models and interactive instrumental variable regression models. 'DoubleML' allows estimation of the nuisance parts in these models by machine learning methods and computation of the Neyman orthogonal score functions. 'DoubleML' is built on top of 'mlr3' and the 'mlr3' ecosystem. The object-oriented implementation of 'DoubleML' based on the 'R6' package is very flexible. More information available in the publication in the Journal of Statistical Software: <doi:10.18637/jss.v108.i03>. |
Authors: | Philipp Bach [aut, cre], Victor Chernozhukov [aut], Malte S. Kurz [aut], Martin Spindler [aut], Klaassen Sven [aut] |
Maintainer: | Philipp Bach <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0.9000 |
Built: | 2024-11-22 13:30:23 UTC |
Source: | https://github.com/doubleml/doubleml-for-r |
Initalization of DoubleMLData from data.frame
.
double_ml_data_from_data_frame( df, x_cols = NULL, y_col = NULL, d_cols = NULL, z_cols = NULL, cluster_cols = NULL, use_other_treat_as_covariate = TRUE )
double_ml_data_from_data_frame( df, x_cols = NULL, y_col = NULL, d_cols = NULL, z_cols = NULL, cluster_cols = NULL, use_other_treat_as_covariate = TRUE )
df |
( |
x_cols |
( |
y_col |
( |
d_cols |
( |
z_cols |
( |
cluster_cols |
( |
use_other_treat_as_covariate |
( |
Creates a new instance of class DoubleMLData
.
df = make_plr_CCDDHNR2018(return_type = "data.frame") x_names = names(df)[grepl("X", names(df))] obj_dml_data = double_ml_data_from_data_frame( df = df, x_cols = x_names, y_col = "y", d_cols = "d") # Input: Data frame, Output: DoubleMLData object
df = make_plr_CCDDHNR2018(return_type = "data.frame") x_names = names(df)[grepl("X", names(df))] obj_dml_data = double_ml_data_from_data_frame( df = df, x_cols = x_names, y_col = "y", d_cols = "d") # Input: Data frame, Output: DoubleMLData object
Initalization of DoubleMLData from matrix()
objects.
double_ml_data_from_matrix( X = NULL, y, d, z = NULL, cluster_vars = NULL, data_class = "DoubleMLData", use_other_treat_as_covariate = TRUE )
double_ml_data_from_matrix( X = NULL, y, d, z = NULL, cluster_vars = NULL, data_class = "DoubleMLData", use_other_treat_as_covariate = TRUE )
X |
( |
y |
( |
d |
( |
z |
( |
cluster_vars |
( |
data_class |
( |
use_other_treat_as_covariate |
( |
Creates a new instance of class DoubleMLData
.
matrix_list = make_plr_CCDDHNR2018(return_type = "matrix") obj_dml_data = double_ml_data_from_matrix( X = matrix_list$X, y = matrix_list$y, d = matrix_list$d)
matrix_list = make_plr_CCDDHNR2018(return_type = "matrix") obj_dml_data = double_ml_data_from_matrix( X = matrix_list$X, y = matrix_list$y, d = matrix_list$d)
Abstract base class that can't be initialized.
R6::R6Class object.
all_coef
(matrix()
)
Estimates of the causal parameter(s) for the n_rep
different sample
splits after calling fit()
.
all_dml1_coef
(array()
)
Estimates of the causal parameter(s) for the n_rep
different sample
splits after calling fit()
with dml_procedure = "dml1"
.
all_se
(matrix()
)
Standard errors of the causal parameter(s) for the n_rep
different
sample splits after calling fit()
.
apply_cross_fitting
(logical(1)
)
Indicates whether cross-fitting should be applied. Default is TRUE
.
boot_coef
(matrix()
)
Bootstrapped coefficients for the causal parameter(s) after calling
fit()
and bootstrap()
.
boot_t_stat
(matrix()
)
Bootstrapped t-statistics for the causal parameter(s) after calling
fit()
and bootstrap()
.
coef
(numeric()
)
Estimates for the causal parameter(s) after calling fit()
.
data
(data.table
)
Data object.
dml_procedure
(character(1)
)
A character()
("dml1"
or "dml2"
) specifying the double machine
learning algorithm. Default is "dml2"
.
draw_sample_splitting
(logical(1)
)
Indicates whether the sample splitting should be drawn during
initialization of the object. Default is TRUE
.
learner
(named list()
)
The machine learners for the nuisance functions.
n_folds
(integer(1)
)
Number of folds. Default is 5
.
n_rep
(integer(1)
)
Number of repetitions for the sample splitting. Default is 1
.
params
(named list()
)
The hyperparameters of the learners.
psi
(array()
)
Value of the score function
after calling
fit()
.
psi_a
(array()
)
Value of the score function component after
calling
fit()
.
psi_b
(array()
)
Value of the score function component after
calling
fit()
.
predictions
(array()
)
Predictions of the nuisance models after calling
fit(store_predictions=TRUE)
.
models
(array()
)
The fitted nuisance models after calling
fit(store_models=TRUE)
.
pval
(numeric()
)
p-values for the causal parameter(s) after calling fit()
.
score
(character(1)
, function()
)
A character(1)
or function()
specifying the score function.
se
(numeric()
)
Standard errors for the causal parameter(s) after calling fit()
.
smpls
(list()
)
The partition used for cross-fitting.
smpls_cluster
(list()
)
The partition of clusters used for cross-fitting.
t_stat
(numeric()
)
t-statistics for the causal parameter(s) after calling fit()
.
tuning_res
(named list()
)
Results from hyperparameter tuning.
new()
DoubleML is an abstract class that can't be initialized.
DoubleML$new()
print()
Print DoubleML objects.
DoubleML$print()
fit()
Estimate DoubleML models.
DoubleML$fit(store_predictions = FALSE, store_models = FALSE)
store_predictions
(logical(1)
)
Indicates whether the predictions for the nuisance functions should be
stored in field predictions
. Default is FALSE
.
store_models
(logical(1)
)
Indicates whether the fitted models for the nuisance functions should be
stored in field models
if you want to analyze the models or extract
information like variable importance. Default is FALSE
.
self
bootstrap()
Multiplier bootstrap for DoubleML models.
DoubleML$bootstrap(method = "normal", n_rep_boot = 500)
method
(character(1)
)
A character(1)
("Bayes"
, "normal"
or "wild"
) specifying the
multiplier bootstrap method.
n_rep_boot
(integer(1)
)
The number of bootstrap replications.
self
split_samples()
Draw sample splitting for DoubleML models.
The samples are drawn according to the attributes n_folds
, n_rep
and apply_cross_fitting
.
DoubleML$split_samples()
self
set_sample_splitting()
Set the sample splitting for DoubleML models.
The attributes n_folds
and n_rep
are derived from the provided
partition.
DoubleML$set_sample_splitting(smpls)
smpls
(list()
)
A nested list()
. The outer lists needs to provide an entry per
repeated sample splitting (length of the list is set as n_rep
).
The inner list is a named list()
with names train_ids
and test_ids
.
The entries in train_ids
and test_ids
must be partitions per fold
(length of train_ids
and test_ids
is set as n_folds
).
self
library(DoubleML) library(mlr3) set.seed(2) obj_dml_data = make_plr_CCDDHNR2018(n_obs=10) dml_plr_obj = DoubleMLPLR$new(obj_dml_data, lrn("regr.rpart"), lrn("regr.rpart")) # simple sample splitting with two folds and without cross-fitting smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5)), test_ids = list(c(6, 7, 8, 9, 10)))) dml_plr_obj$set_sample_splitting(smpls) # sample splitting with two folds and cross-fitting but no repeated cross-fitting smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)), test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5)))) dml_plr_obj$set_sample_splitting(smpls) # sample splitting with two folds and repeated cross-fitting with n_rep = 2 smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)), test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5))), list(train_ids = list(c(1, 3, 5, 7, 9), c(2, 4, 6, 8, 10)), test_ids = list(c(2, 4, 6, 8, 10), c(1, 3, 5, 7, 9)))) dml_plr_obj$set_sample_splitting(smpls)
tune()
Hyperparameter-tuning for DoubleML models.
The hyperparameter-tuning is performed using the tuning methods provided in the mlr3tuning package. For more information on tuning in mlr3, we refer to the section on parameter tuning in the mlr3 book.
DoubleML$tune( param_set, tune_settings = list(n_folds_tune = 5, rsmp_tune = mlr3::rsmp("cv", folds = 5), measure = NULL, terminator = mlr3tuning::trm("evals", n_evals = 20), algorithm = mlr3tuning::tnr("grid_search"), resolution = 5), tune_on_folds = FALSE )
param_set
(named list()
)
A named list
with a parameter grid for each nuisance model/learner
(see method learner_names()
). The parameter grid must be an object of
class ParamSet.
tune_settings
(named list()
)
A named list()
with arguments passed to the hyperparameter-tuning with
mlr3tuning to set up
TuningInstance objects.
tune_settings
has entries
terminator
(Terminator)
A Terminator object. Specification of terminator
is required to perform tuning.
algorithm
(Tuner or character(1)
)
A Tuner object (recommended) or key passed to the
respective dictionary to specify the tuning algorithm used in
tnr(). algorithm
is passed as an argument to
tnr(). If algorithm
is not specified by the users,
default is set to "grid_search"
. If set to "grid_search"
, then
additional argument "resolution"
is required.
rsmp_tune
(Resampling or character(1)
)
A Resampling object (recommended) or option passed
to rsmp() to initialize a
Resampling for parameter tuning in mlr3
.
If not specified by the user, default is set to "cv"
(cross-validation).
n_folds_tune
(integer(1)
, optional)
If rsmp_tune = "cv"
, number of folds used for cross-validation.
If not specified by the user, default is set to 5
.
measure
(NULL
, named list()
, optional)
Named list containing the measures used for parameter tuning. Entries in
list must either be Measure objects or keys to be
passed to passed to msr(). The names of the entries must
match the learner names (see method learner_names()
). If set to NULL
,
default measures are used, i.e., "regr.mse"
for continuous outcome
variables and "classif.ce"
for binary outcomes.
resolution
(character(1)
)
The key passed to the respective
dictionary to specify the tuning algorithm used in
tnr(). resolution
is passed as an argument to
tnr().
tune_on_folds
(logical(1)
)
Indicates whether the tuning should be done fold-specific or globally.
Default is FALSE
.
self
summary()
Summary for DoubleML models after calling fit()
.
DoubleML$summary(digits = max(3L, getOption("digits") - 3L))
digits
(integer(1)
)
The number of significant digits to use when printing.
confint()
Confidence intervals for DoubleML models.
DoubleML$confint(parm, joint = FALSE, level = 0.95)
parm
(numeric()
or character()
)
A specification of which parameters are to be given confidence intervals
among the variables for which inference was done, either a vector of
numbers or a vector of names. If missing, all parameters are considered
(default).
joint
(logical(1)
)
Indicates whether joint confidence intervals are computed.
Default is FALSE
.
level
(numeric(1)
)
The confidence level. Default is 0.95
.
A matrix()
with the confidence interval(s).
learner_names()
Returns the names of the learners.
DoubleML$learner_names()
character()
with names of learners.
params_names()
Returns the names of the nuisance models with hyperparameters.
DoubleML$params_names()
character()
with names of nuisance models with hyperparameters.
set_ml_nuisance_params()
Set hyperparameters for the nuisance models of DoubleML models.
Note that in the current implementation, either all parameters have to be set globally or all parameters have to be provided fold-specific.
DoubleML$set_ml_nuisance_params( learner = NULL, treat_var = NULL, params, set_fold_specific = FALSE )
learner
(character(1)
)
The nuisance model/learner (see method params_names
).
treat_var
(character(1)
)
The treatment varaible (hyperparameters can be set treatment-variable
specific).
params
(named list()
)
A named list()
with estimator parameters. Parameters are used for all
folds by default. Alternatively, parameters can be passed in a
fold-specific way if option fold_specific
is TRUE
. In this case, the
outer list needs to be of length n_rep
and the inner list of length
n_folds
.
set_fold_specific
(logical(1)
)
Indicates if the parameters passed in params
should be passed in
fold-specific way. Default is FALSE
. If TRUE
, the outer list needs
to be of length n_rep
and the inner list of length n_folds
.
Note that in the current implementation, either all parameters have to
be set globally or all parameters have to be provided fold-specific.
self
p_adjust()
Multiple testing adjustment for DoubleML models.
DoubleML$p_adjust(method = "romano-wolf", return_matrix = TRUE)
method
(character(1)
)
A character(1)
("romano-wolf"
, "bonferroni"
, "holm"
, etc)
specifying the adjustment method. In addition to "romano-wolf"
,
all methods implemented in p.adjust() can be
applied. Default is "romano-wolf"
.
return_matrix
(logical(1)
)
Indicates if the output is returned as a matrix with corresponding
coefficient names.
numeric()
with adjusted p-values. If return_matrix = TRUE
,
a matrix()
with adjusted p_values.
get_params()
Get hyperparameters for the nuisance model of DoubleML models.
DoubleML$get_params(learner)
learner
(character(1)
)
The nuisance model/learner (see method params_names()
)
named list()
with paramers for the nuisance model/learner.
clone()
The objects of this class are cloneable with this method.
DoubleML$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other DoubleML:
DoubleMLIIVM
,
DoubleMLIRM
,
DoubleMLPLIV
,
DoubleMLPLR
## ------------------------------------------------ ## Method `DoubleML$set_sample_splitting` ## ------------------------------------------------ library(DoubleML) library(mlr3) set.seed(2) obj_dml_data = make_plr_CCDDHNR2018(n_obs=10) dml_plr_obj = DoubleMLPLR$new(obj_dml_data, lrn("regr.rpart"), lrn("regr.rpart")) # simple sample splitting with two folds and without cross-fitting smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5)), test_ids = list(c(6, 7, 8, 9, 10)))) dml_plr_obj$set_sample_splitting(smpls) # sample splitting with two folds and cross-fitting but no repeated cross-fitting smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)), test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5)))) dml_plr_obj$set_sample_splitting(smpls) # sample splitting with two folds and repeated cross-fitting with n_rep = 2 smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)), test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5))), list(train_ids = list(c(1, 3, 5, 7, 9), c(2, 4, 6, 8, 10)), test_ids = list(c(2, 4, 6, 8, 10), c(1, 3, 5, 7, 9)))) dml_plr_obj$set_sample_splitting(smpls)
## ------------------------------------------------ ## Method `DoubleML$set_sample_splitting` ## ------------------------------------------------ library(DoubleML) library(mlr3) set.seed(2) obj_dml_data = make_plr_CCDDHNR2018(n_obs=10) dml_plr_obj = DoubleMLPLR$new(obj_dml_data, lrn("regr.rpart"), lrn("regr.rpart")) # simple sample splitting with two folds and without cross-fitting smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5)), test_ids = list(c(6, 7, 8, 9, 10)))) dml_plr_obj$set_sample_splitting(smpls) # sample splitting with two folds and cross-fitting but no repeated cross-fitting smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)), test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5)))) dml_plr_obj$set_sample_splitting(smpls) # sample splitting with two folds and repeated cross-fitting with n_rep = 2 smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)), test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5))), list(train_ids = list(c(1, 3, 5, 7, 9), c(2, 4, 6, 8, 10)), test_ids = list(c(2, 4, 6, 8, 10), c(1, 3, 5, 7, 9)))) dml_plr_obj$set_sample_splitting(smpls)
Double machine learning data-backend for data with cluster variables.
DoubleMLClusterData
objects can be initialized from a
data.table. Alternatively DoubleML
provides
functions to initialize from a collection of matrix
objects or
a data.frame
. The following functions can be used to create a new
instance of DoubleMLClusterData
.
DoubleMLClusterData$new()
for initialization from a data.table
.
double_ml_data_from_matrix()
for initialization from matrix
objects,
double_ml_data_from_data_frame()
for initialization from a data.frame
.
DoubleML::DoubleMLData
-> DoubleMLClusterData
cluster_cols
(character()
)
The cluster variable(s).
x_cols
(NULL
, character()
)
The covariates. If NULL
, all variables (columns of data
) which are
neither specified as outcome variable y_col
, nor as treatment variables
d_cols
, nor as instrumental variables z_cols
, nor as cluster
variables cluster_cols
are used as covariates.
Default is NULL
.
n_cluster_vars
(integer(1)
)
The number of cluster variables.
new()
Creates a new instance of this R6 class.
DoubleMLClusterData$new( data = NULL, x_cols = NULL, y_col = NULL, d_cols = NULL, cluster_cols = NULL, z_cols = NULL, use_other_treat_as_covariate = TRUE )
data
(data.table
, data.frame()
)
Data object.
x_cols
(NULL
, character()
)
The covariates. If NULL
, all variables (columns of data
) which are
neither specified as outcome variable y_col
, nor as treatment variables
d_cols
, nor as instrumental variables z_cols
are used as covariates.
Default is NULL
.
y_col
(character(1)
)
The outcome variable.
d_cols
(character()
)
The treatment variable(s).
cluster_cols
(character()
)
The cluster variable(s).
z_cols
(NULL
, character()
)
The instrumental variables. Default is NULL
.
use_other_treat_as_covariate
(logical(1)
)
Indicates whether in the multiple-treatment case the other treatment
variables should be added as covariates. Default is TRUE
.
print()
Print DoubleMLClusterData objects.
DoubleMLClusterData$print()
set_data_model()
Setter function for data_model
. The function implements the causal model
as specified by the user via y_col
, d_cols
, x_cols
, z_cols
and
cluster_cols
and assigns the role for the treatment variables in the
multiple-treatment case.
DoubleMLClusterData$set_data_model(treatment_var)
treatment_var
(character()
)
Active treatment variable that will be set to treat_col
.
clone()
The objects of this class are cloneable with this method.
DoubleMLClusterData$clone(deep = FALSE)
deep
Whether to make a deep clone.
library(DoubleML) dt = make_pliv_multiway_cluster_CKMS2021(return_type = "data.table") obj_dml_data = DoubleMLClusterData$new(dt, y_col = "Y", d_cols = "D", z_cols = "Z", cluster_cols = c("cluster_var_i", "cluster_var_j"))
library(DoubleML) dt = make_pliv_multiway_cluster_CKMS2021(return_type = "data.table") obj_dml_data = DoubleMLClusterData$new(dt, y_col = "Y", d_cols = "D", z_cols = "Z", cluster_cols = c("cluster_var_i", "cluster_var_j"))
Double machine learning data-backend.
DoubleMLData
objects can be initialized from a
data.table. Alternatively DoubleML
provides
functions to initialize from a collection of matrix
objects or
a data.frame
. The following functions can be used to create a new
instance of DoubleMLData
.
DoubleMLData$new()
for initialization from a data.table
.
double_ml_data_from_matrix()
for initialization from matrix
objects,
double_ml_data_from_data_frame()
for initialization from a data.frame
.
all_variables
(character()
)
All variables available in the dataset.
d_cols
(character()
)
The treatment variable(s).
data
(data.table
)
Data object.
data_model
(data.table
)
Internal data object that implements the causal model as specified by
the user via y_col
, d_cols
, x_cols
and z_cols
.
n_instr
(NULL
, integer(1)
)
The number of instruments.
n_obs
(integer(1)
)
The number of observations.
n_treat
(integer(1)
)
The number of treatment variables.
other_treat_cols
(NULL
, character()
)
If use_other_treat_as_covariate
is TRUE
, other_treat_cols
are the
treatment variables that are not "active" in the multiple-treatment case.
These variables then are internally added to the covariates x_cols
during
the fitting stage. If use_other_treat_as_covariate
is FALSE
,
other_treat_cols
is NULL
.
treat_col
(character(1)
)
"Active" treatment variable in the multiple-treatment case.
use_other_treat_as_covariate
(logical(1)
)
Indicates whether in the multiple-treatment case the other treatment
variables should be added as covariates. Default is TRUE
.
x_cols
(NULL
, character()
)
The covariates. If NULL
, all variables (columns of data
) which are
neither specified as outcome variable y_col
, nor as treatment variables
d_cols
, nor as instrumental variables z_cols
are used as covariates.
Default is NULL
.
y_col
(character(1)
)
The outcome variable.
z_cols
(NULL
, character()
)
The instrumental variables. Default is NULL
.
new()
Creates a new instance of this R6 class.
DoubleMLData$new( data = NULL, x_cols = NULL, y_col = NULL, d_cols = NULL, z_cols = NULL, use_other_treat_as_covariate = TRUE )
data
(data.table
, data.frame()
)
Data object.
x_cols
(NULL
, character()
)
The covariates. If NULL
, all variables (columns of data
) which are
neither specified as outcome variable y_col
, nor as treatment variables
d_cols
, nor as instrumental variables z_cols
are used as covariates.
Default is NULL
.
y_col
(character(1)
)
The outcome variable.
d_cols
(character()
)
The treatment variable(s).
z_cols
(NULL
, character()
)
The instrumental variables. Default is NULL
.
use_other_treat_as_covariate
(logical(1)
)
Indicates whether in the multiple-treatment case the other treatment
variables should be added as covariates. Default is TRUE
.
print()
Print DoubleMLData objects.
DoubleMLData$print()
set_data_model()
Setter function for data_model
. The function implements the causal
model as specified by the user via y_col
, d_cols
, x_cols
and
z_cols
and assigns the role for the treatment variables in the
multiple-treatment case.
DoubleMLData$set_data_model(treatment_var)
treatment_var
(character()
)
Active treatment variable that will be set to treat_col
.
clone()
The objects of this class are cloneable with this method.
DoubleMLData$clone(deep = FALSE)
deep
Whether to make a deep clone.
library(DoubleML) df = make_plr_CCDDHNR2018(return_type = "data.table") obj_dml_data = DoubleMLData$new(df, y_col = "y", d_cols = "d")
library(DoubleML) df = make_plr_CCDDHNR2018(return_type = "data.table") obj_dml_data = DoubleMLData$new(df, y_col = "y", d_cols = "d")
Double machine learning for interactive IV regression models.
R6::R6Class object inheriting from DoubleML.
Interactive IV regression (IIVM) models take the form
,
,
with and
.
is the outcome
variable,
is the binary treatment variable and
is a binary instrumental variable. Consider the functions
,
and
, where
maps the support of
to
and
and
, respectively, map the
support of
and
to
for some
, such that
with ,
and
. The target
parameter of interest in this model is the local average treatment effect
(LATE),
DoubleML::DoubleML
-> DoubleMLIIVM
subgroups
(named list(2)
)
Named list(2)
with options to adapt to cases with and without the
subgroups of always-takers and never-takes.
The entry always_takers
(logical(1)
) speficies whether there are
always takers in the sample. The entry never_takers
(logical(1)
)
speficies whether there are never takers in the sample.
trimming_rule
(character(1)
)
A character(1)
specifying the trimming approach.
trimming_threshold
(numeric(1)
)
The threshold used for timming.
DoubleML::DoubleML$bootstrap()
DoubleML::DoubleML$confint()
DoubleML::DoubleML$fit()
DoubleML::DoubleML$get_params()
DoubleML::DoubleML$learner_names()
DoubleML::DoubleML$p_adjust()
DoubleML::DoubleML$params_names()
DoubleML::DoubleML$print()
DoubleML::DoubleML$set_ml_nuisance_params()
DoubleML::DoubleML$set_sample_splitting()
DoubleML::DoubleML$split_samples()
DoubleML::DoubleML$summary()
DoubleML::DoubleML$tune()
new()
Creates a new instance of this R6 class.
DoubleMLIIVM$new( data, ml_g, ml_m, ml_r, n_folds = 5, n_rep = 1, score = "LATE", subgroups = list(always_takers = TRUE, never_takers = TRUE), dml_procedure = "dml2", trimming_rule = "truncate", trimming_threshold = 1e-12, draw_sample_splitting = TRUE, apply_cross_fitting = TRUE )
data
(DoubleMLData
)
The DoubleMLData
object providing the data and specifying the variables
of the causal model.
ml_g
(LearnerRegr
,
LearnerClassif
, Learner
,
character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
For binary treatment outcomes, an object of the class
LearnerClassif
can be passed, for example
lrn("classif.cv_glmnet", s = "lambda.min")
.
Alternatively, a Learner
object with public field
task_type = "regr"
or task_type = "classif"
can be passed,
respectively, for example of class
GraphLearner
. ml_g
refers to the nuisance function .
ml_m
(LearnerClassif
,
Learner
, character(1)
)
A learner of the class LearnerClassif
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "classif"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("classif.cv_glmnet", s = "lambda.min")
. ml_m
refers to the nuisance function .
ml_r
(LearnerClassif
,
Learner
, character(1)
)
A learner of the class LearnerClassif
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "classif"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("classif.cv_glmnet", s = "lambda.min")
. ml_r
refers to the nuisance function .
n_folds
(integer(1)
)
Number of folds. Default is 5
.
n_rep
(integer(1)
)
Number of repetitions for the sample splitting. Default is 1
.
score
(character(1)
, function()
)
A character(1)
("LATE"
is the only choice) specifying the score
function.
If a function()
is provided, it must be of the form
function(y, z, d, g0_hat, g1_hat, m_hat, r0_hat, r1_hat, smpls)
and
the returned output must be a named list()
with elements psi_a
and
psi_b
. Default is "LATE"
.
subgroups
(named list(2)
)
Named list(2)
with options to adapt to cases with and without the
subgroups of always-takers and never-takes. The entry
always_takers
(logical(1)
) speficies whether there are always takers
in the sample. The entry never_takers
(logical(1)
) speficies whether
there are never takers in the sample. Default is
list(always_takers = TRUE, never_takers = TRUE)
.
dml_procedure
(character(1)
)
A character(1)
("dml1"
or "dml2"
) specifying the double machine
learning algorithm. Default is "dml2"
.
trimming_rule
(character(1)
)
A character(1)
("truncate"
is the only choice) specifying the
trimming approach. Default is "truncate"
.
trimming_threshold
(numeric(1)
)
The threshold used for timming. Default is 1e-12
.
draw_sample_splitting
(logical(1)
)
Indicates whether the sample splitting should be drawn during
initialization of the object. Default is TRUE
.
apply_cross_fitting
(logical(1)
)
Indicates whether cross-fitting should be applied. Default is TRUE
.
clone()
The objects of this class are cloneable with this method.
DoubleMLIIVM$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other DoubleML:
DoubleML
,
DoubleMLIRM
,
DoubleMLPLIV
,
DoubleMLPLR
library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(2) ml_g = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_m = lrn("classif.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_r = ml_m$clone() obj_dml_data = make_iivm_data( theta = 0.5, n_obs = 1000, alpha_x = 1, dim_x = 20) dml_iivm_obj = DoubleMLIIVM$new(obj_dml_data, ml_g, ml_m, ml_r) dml_iivm_obj$fit() dml_iivm_obj$summary() ## Not run: library(DoubleML) library(mlr3) library(mlr3learners) library(mlr3tuning) library(data.table) set.seed(2) ml_g = lrn("regr.rpart") ml_m = lrn("classif.rpart") ml_r = ml_m$clone() obj_dml_data = make_iivm_data( theta = 0.5, n_obs = 1000, alpha_x = 1, dim_x = 20) dml_iivm_obj = DoubleMLIIVM$new(obj_dml_data, ml_g, ml_m, ml_r) param_grid = list( "ml_g" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_m" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_r" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2))) # minimum requirements for tune_settings tune_settings = list( terminator = mlr3tuning::trm("evals", n_evals = 5), algorithm = mlr3tuning::tnr("grid_search", resolution = 5)) dml_iivm_obj$tune(param_set = param_grid, tune_settings = tune_settings) dml_iivm_obj$fit() dml_iivm_obj$summary() ## End(Not run)
library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(2) ml_g = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_m = lrn("classif.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_r = ml_m$clone() obj_dml_data = make_iivm_data( theta = 0.5, n_obs = 1000, alpha_x = 1, dim_x = 20) dml_iivm_obj = DoubleMLIIVM$new(obj_dml_data, ml_g, ml_m, ml_r) dml_iivm_obj$fit() dml_iivm_obj$summary() ## Not run: library(DoubleML) library(mlr3) library(mlr3learners) library(mlr3tuning) library(data.table) set.seed(2) ml_g = lrn("regr.rpart") ml_m = lrn("classif.rpart") ml_r = ml_m$clone() obj_dml_data = make_iivm_data( theta = 0.5, n_obs = 1000, alpha_x = 1, dim_x = 20) dml_iivm_obj = DoubleMLIIVM$new(obj_dml_data, ml_g, ml_m, ml_r) param_grid = list( "ml_g" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_m" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_r" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2))) # minimum requirements for tune_settings tune_settings = list( terminator = mlr3tuning::trm("evals", n_evals = 5), algorithm = mlr3tuning::tnr("grid_search", resolution = 5)) dml_iivm_obj$tune(param_set = param_grid, tune_settings = tune_settings) dml_iivm_obj$fit() dml_iivm_obj$summary() ## End(Not run)
Double machine learning for interactive regression models.
R6::R6Class object inheriting from DoubleML.
Interactive regression (IRM) models take the form
,
,
with and
.
is the outcome variable
and
is the binary treatment variable. We consider
estimation of the average treamtent effects when treatment effects are
fully heterogeneous. Target parameters of interest in this model are the
average treatment effect (ATE),
and the average treament effect on the treated (ATTE),
.
DoubleML::DoubleML
-> DoubleMLIRM
trimming_rule
(character(1)
)
A character(1)
specifying the trimming approach.
trimming_threshold
(numeric(1)
)
The threshold used for timming.
DoubleML::DoubleML$bootstrap()
DoubleML::DoubleML$confint()
DoubleML::DoubleML$fit()
DoubleML::DoubleML$get_params()
DoubleML::DoubleML$learner_names()
DoubleML::DoubleML$p_adjust()
DoubleML::DoubleML$params_names()
DoubleML::DoubleML$print()
DoubleML::DoubleML$set_ml_nuisance_params()
DoubleML::DoubleML$set_sample_splitting()
DoubleML::DoubleML$split_samples()
DoubleML::DoubleML$summary()
DoubleML::DoubleML$tune()
new()
Creates a new instance of this R6 class.
DoubleMLIRM$new( data, ml_g, ml_m, n_folds = 5, n_rep = 1, score = "ATE", trimming_rule = "truncate", trimming_threshold = 1e-12, dml_procedure = "dml2", draw_sample_splitting = TRUE, apply_cross_fitting = TRUE )
data
(DoubleMLData
)
The DoubleMLData
object providing the data and specifying the variables
of the causal model.
ml_g
(LearnerRegr
,
LearnerClassif
, Learner
,
character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
For binary treatment outcomes, an object of the class
LearnerClassif
can be passed, for example
lrn("classif.cv_glmnet", s = "lambda.min")
.
Alternatively, a Learner
object with public field
task_type = "regr"
or task_type = "classif"
can be passed,
respectively, for example of class
GraphLearner
. ml_g
refers to the nuisance function .
ml_m
(LearnerClassif
,
Learner
, character(1)
)
A learner of the class LearnerClassif
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "classif"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("classif.cv_glmnet", s = "lambda.min")
. ml_m
refers to the nuisance function .
n_folds
(integer(1)
)
Number of folds. Default is 5
.
n_rep
(integer(1)
)
Number of repetitions for the sample splitting. Default is 1
.
score
(character(1)
, function()
)
A character(1)
("ATE"
or ATTE
) or a function()
specifying the
score function. If a function()
is provided, it must be of the form
function(y, d, g0_hat, g1_hat, m_hat, smpls)
and the returned output
must be a named list()
with elements psi_a
and psi_b
.
Default is "ATE"
.
trimming_rule
(character(1)
)
A character(1)
("truncate"
is the only choice) specifying the
trimming approach. Default is "truncate"
.
trimming_threshold
(numeric(1)
)
The threshold used for timming. Default is 1e-12
.
dml_procedure
(character(1)
)
A character(1)
("dml1"
or "dml2"
) specifying the double machine
learning algorithm. Default is "dml2"
.
draw_sample_splitting
(logical(1)
)
Indicates whether the sample splitting should be drawn during
initialization of the object. Default is TRUE
.
apply_cross_fitting
(logical(1)
)
Indicates whether cross-fitting should be applied. Default is TRUE
.
clone()
The objects of this class are cloneable with this method.
DoubleMLIRM$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other DoubleML:
DoubleML
,
DoubleMLIIVM
,
DoubleMLPLIV
,
DoubleMLPLR
library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(2) ml_g = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_m = lrn("classif.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) obj_dml_data = make_irm_data(theta = 0.5) dml_irm_obj = DoubleMLIRM$new(obj_dml_data, ml_g, ml_m) dml_irm_obj$fit() dml_irm_obj$summary() ## Not run: library(DoubleML) library(mlr3) library(mlr3learners) library(mlr3uning) library(data.table) set.seed(2) ml_g = lrn("regr.rpart") ml_m = lrn("classif.rpart") obj_dml_data = make_irm_data(theta = 0.5) dml_irm_obj = DoubleMLIRM$new(obj_dml_data, ml_g, ml_m) param_grid = list( "ml_g" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_m" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2))) # minimum requirements for tune_settings tune_settings = list( terminator = mlr3tuning::trm("evals", n_evals = 5), algorithm = mlr3tuning::tnr("grid_search", resolution = 5)) dml_irm_obj$tune(param_set = param_grid, tune_settings = tune_settings) dml_irm_obj$fit() dml_irm_obj$summary() ## End(Not run)
library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(2) ml_g = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_m = lrn("classif.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) obj_dml_data = make_irm_data(theta = 0.5) dml_irm_obj = DoubleMLIRM$new(obj_dml_data, ml_g, ml_m) dml_irm_obj$fit() dml_irm_obj$summary() ## Not run: library(DoubleML) library(mlr3) library(mlr3learners) library(mlr3uning) library(data.table) set.seed(2) ml_g = lrn("regr.rpart") ml_m = lrn("classif.rpart") obj_dml_data = make_irm_data(theta = 0.5) dml_irm_obj = DoubleMLIRM$new(obj_dml_data, ml_g, ml_m) param_grid = list( "ml_g" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_m" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2))) # minimum requirements for tune_settings tune_settings = list( terminator = mlr3tuning::trm("evals", n_evals = 5), algorithm = mlr3tuning::tnr("grid_search", resolution = 5)) dml_irm_obj$tune(param_set = param_grid, tune_settings = tune_settings) dml_irm_obj$fit() dml_irm_obj$summary() ## End(Not run)
Double machine learning for partially linear IV regression models.
R6::R6Class object inheriting from DoubleML.
Partially linear IV regression (PLIV) models take the form
,
,
with and
.
is the outcome variable variable,
is the policy variable of interest and
denotes one or multiple instrumental variables. The high-dimensional vector
consists of other confounding covariates, and
and
are stochastic errors.
DoubleML::DoubleML
-> DoubleMLPLIV
partialX
(logical(1)
)
Indicates whether covariates should be partialled out.
partialZ
(logical(1)
)
Indicates whether instruments should be partialled out.
DoubleML::DoubleML$bootstrap()
DoubleML::DoubleML$confint()
DoubleML::DoubleML$fit()
DoubleML::DoubleML$get_params()
DoubleML::DoubleML$learner_names()
DoubleML::DoubleML$p_adjust()
DoubleML::DoubleML$params_names()
DoubleML::DoubleML$print()
DoubleML::DoubleML$set_sample_splitting()
DoubleML::DoubleML$split_samples()
DoubleML::DoubleML$summary()
new()
Creates a new instance of this R6 class.
DoubleMLPLIV$new( data, ml_l, ml_m, ml_r, ml_g = NULL, partialX = TRUE, partialZ = FALSE, n_folds = 5, n_rep = 1, score = "partialling out", dml_procedure = "dml2", draw_sample_splitting = TRUE, apply_cross_fitting = TRUE )
data
(DoubleMLData
)
The DoubleMLData
object providing the data and specifying the variables
of the causal model.
ml_l
(LearnerRegr
,
Learner
, character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "regr"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("regr.cv_glmnet", s = "lambda.min")
. ml_l
refers to the nuisance function .
ml_m
(LearnerRegr
,
Learner
, character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "regr"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("regr.cv_glmnet", s = "lambda.min")
. ml_m
refers to the nuisance function .
ml_r
(LearnerRegr
,
Learner
, character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "regr"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("regr.cv_glmnet", s = "lambda.min")
. ml_r
refers to the nuisance function .
ml_g
(LearnerRegr
,
Learner
, character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "regr"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("regr.cv_glmnet", s = "lambda.min")
. ml_g
refers to the nuisance function .
Note: The learner
ml_g
is only required for the score 'IV-type'
.
Optionally, it can be specified and estimated for callable scores.
partialX
(logical(1)
)
Indicates whether covariates should be partialled out.
Default is
TRUE
.
partialZ
(logical(1)
)
Indicates whether instruments should be partialled out.
Default is
FALSE
.
n_folds
(integer(1)
)
Number of folds. Default is 5
.
n_rep
(integer(1)
)
Number of repetitions for the sample splitting. Default is 1
.
score
(character(1)
, function()
)
A character(1)
("partialling out"
or "IV-type"
) or a function()
specifying the score function.
If a function()
is provided, it must be of the form
function(y, z, d, l_hat, m_hat, r_hat, g_hat, smpls)
and
the returned output must be a named list()
with elements
psi_a
and psi_b
. Default is "partialling out"
.
dml_procedure
(character(1)
)
A character(1)
("dml1"
or "dml2"
) specifying the double machine
learning algorithm. Default is "dml2"
.
draw_sample_splitting
(logical(1)
)
Indicates whether the sample splitting should be drawn during
initialization of the object. Default is TRUE
.
apply_cross_fitting
(logical(1)
)
Indicates whether cross-fitting should be applied. Default is TRUE
.
set_ml_nuisance_params()
Set hyperparameters for the nuisance models of DoubleML models.
Note that in the current implementation, either all parameters have to be set globally or all parameters have to be provided fold-specific.
DoubleMLPLIV$set_ml_nuisance_params( learner = NULL, treat_var = NULL, params, set_fold_specific = FALSE )
learner
(character(1)
)
The nuisance model/learner (see method params_names
).
treat_var
(character(1)
)
The treatment varaible (hyperparameters can be set treatment-variable
specific).
params
(named list()
)
A named list()
with estimator parameters. Parameters are used for all
folds by default. Alternatively, parameters can be passed in a
fold-specific way if option fold_specific
is TRUE
. In this case, the
outer list needs to be of length n_rep
and the inner list of length
n_folds
.
set_fold_specific
(logical(1)
)
Indicates if the parameters passed in params
should be passed in
fold-specific way. Default is FALSE
. If TRUE
, the outer list needs
to be of length n_rep
and the inner list of length n_folds
.
Note that in the current implementation, either all parameters have to
be set globally or all parameters have to be provided fold-specific.
self
tune()
Hyperparameter-tuning for DoubleML models.
The hyperparameter-tuning is performed using the tuning methods provided in the mlr3tuning package. For more information on tuning in mlr3, we refer to the section on parameter tuning in the mlr3 book.
DoubleMLPLIV$tune( param_set, tune_settings = list(n_folds_tune = 5, rsmp_tune = mlr3::rsmp("cv", folds = 5), measure = NULL, terminator = mlr3tuning::trm("evals", n_evals = 20), algorithm = mlr3tuning::tnr("grid_search"), resolution = 5), tune_on_folds = FALSE )
param_set
(named list()
)
A named list
with a parameter grid for each nuisance model/learner
(see method learner_names()
). The parameter grid must be an object of
class ParamSet.
tune_settings
(named list()
)
A named list()
with arguments passed to the hyperparameter-tuning with
mlr3tuning to set up
TuningInstance objects.
tune_settings
has entries
terminator
(Terminator)
A Terminator object. Specification of terminator
is required to perform tuning.
algorithm
(Tuner or character(1)
)
A Tuner object (recommended) or key passed to the
respective dictionary to specify the tuning algorithm used in
tnr(). algorithm
is passed as an argument to
tnr(). If algorithm
is not specified by the users,
default is set to "grid_search"
. If set to "grid_search"
, then
additional argument "resolution"
is required.
rsmp_tune
(Resampling or character(1)
)
A Resampling object (recommended) or option passed
to rsmp() to initialize a
Resampling for parameter tuning in mlr3
.
If not specified by the user, default is set to "cv"
(cross-validation).
n_folds_tune
(integer(1)
, optional)
If rsmp_tune = "cv"
, number of folds used for cross-validation.
If not specified by the user, default is set to 5
.
measure
(NULL
, named list()
, optional)
Named list containing the measures used for parameter tuning. Entries in
list must either be Measure objects or keys to be
passed to passed to msr(). The names of the entries must
match the learner names (see method learner_names()
). If set to NULL
,
default measures are used, i.e., "regr.mse"
for continuous outcome
variables and "classif.ce"
for binary outcomes.
resolution
(character(1)
)
The key passed to the respective
dictionary to specify the tuning algorithm used in
tnr(). resolution
is passed as an argument to
tnr().
tune_on_folds
(logical(1)
)
Indicates whether the tuning should be done fold-specific or globally.
Default is FALSE
.
self
clone()
The objects of this class are cloneable with this method.
DoubleMLPLIV$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other DoubleML:
DoubleML
,
DoubleMLIIVM
,
DoubleMLIRM
,
DoubleMLPLR
library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(2) ml_l = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_m = ml_l$clone() ml_r = ml_l$clone() obj_dml_data = make_pliv_CHS2015(alpha = 1, n_obs = 500, dim_x = 20, dim_z = 1) dml_pliv_obj = DoubleMLPLIV$new(obj_dml_data, ml_l, ml_m, ml_r) dml_pliv_obj$fit() dml_pliv_obj$summary() ## Not run: library(DoubleML) library(mlr3) library(mlr3learners) library(mlr3tuning) library(data.table) set.seed(2) ml_l = lrn("regr.rpart") ml_m = ml_l$clone() ml_r = ml_l$clone() obj_dml_data = make_pliv_CHS2015( alpha = 1, n_obs = 500, dim_x = 20, dim_z = 1) dml_pliv_obj = DoubleMLPLIV$new(obj_dml_data, ml_l, ml_m, ml_r) param_grid = list( "ml_l" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_m" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_r" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2))) # minimum requirements for tune_settings tune_settings = list( terminator = mlr3tuning::trm("evals", n_evals = 5), algorithm = mlr3tuning::tnr("grid_search", resolution = 5)) dml_pliv_obj$tune(param_set = param_grid, tune_settings = tune_settings) dml_pliv_obj$fit() dml_pliv_obj$summary() ## End(Not run)
library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(2) ml_l = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_m = ml_l$clone() ml_r = ml_l$clone() obj_dml_data = make_pliv_CHS2015(alpha = 1, n_obs = 500, dim_x = 20, dim_z = 1) dml_pliv_obj = DoubleMLPLIV$new(obj_dml_data, ml_l, ml_m, ml_r) dml_pliv_obj$fit() dml_pliv_obj$summary() ## Not run: library(DoubleML) library(mlr3) library(mlr3learners) library(mlr3tuning) library(data.table) set.seed(2) ml_l = lrn("regr.rpart") ml_m = ml_l$clone() ml_r = ml_l$clone() obj_dml_data = make_pliv_CHS2015( alpha = 1, n_obs = 500, dim_x = 20, dim_z = 1) dml_pliv_obj = DoubleMLPLIV$new(obj_dml_data, ml_l, ml_m, ml_r) param_grid = list( "ml_l" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_m" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_r" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2))) # minimum requirements for tune_settings tune_settings = list( terminator = mlr3tuning::trm("evals", n_evals = 5), algorithm = mlr3tuning::tnr("grid_search", resolution = 5)) dml_pliv_obj$tune(param_set = param_grid, tune_settings = tune_settings) dml_pliv_obj$fit() dml_pliv_obj$summary() ## End(Not run)
Double machine learning for partially linear regression models.
R6::R6Class object inheriting from DoubleML.
Partially linear regression (PLR) models take the form
with and
.
is the outcome
variable variable and
is the policy variable of interest.
The high-dimensional vector
consists of other
confounding covariates, and
and
are stochastic errors.
DoubleML::DoubleML
-> DoubleMLPLR
DoubleML::DoubleML$bootstrap()
DoubleML::DoubleML$confint()
DoubleML::DoubleML$fit()
DoubleML::DoubleML$get_params()
DoubleML::DoubleML$learner_names()
DoubleML::DoubleML$p_adjust()
DoubleML::DoubleML$params_names()
DoubleML::DoubleML$print()
DoubleML::DoubleML$set_sample_splitting()
DoubleML::DoubleML$split_samples()
DoubleML::DoubleML$summary()
new()
Creates a new instance of this R6 class.
DoubleMLPLR$new( data, ml_l, ml_m, ml_g = NULL, n_folds = 5, n_rep = 1, score = "partialling out", dml_procedure = "dml2", draw_sample_splitting = TRUE, apply_cross_fitting = TRUE )
data
(DoubleMLData
)
The DoubleMLData
object providing the data and specifying the
variables of the causal model.
ml_l
(LearnerRegr
,
Learner
, character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "regr"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("regr.cv_glmnet", s = "lambda.min")
. ml_l
refers to the nuisance function .
ml_m
(LearnerRegr
,
LearnerClassif
, Learner
,
character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
For binary treatment variables, an object of the class
LearnerClassif
can be passed, for example
lrn("classif.cv_glmnet", s = "lambda.min")
.
Alternatively, a Learner
object with public field
task_type = "regr"
or task_type = "classif"
can be passed,
respectively, for example of class
GraphLearner
. ml_m
refers to the nuisance function .
ml_g
(LearnerRegr
,
Learner
, character(1)
)
A learner of the class LearnerRegr
, which is
available from mlr3 or its
extension packages mlr3learners or
mlr3extralearners.
Alternatively, a Learner
object with public field
task_type = "regr"
can be passed, for example of class
GraphLearner
. The learner can possibly
be passed with specified parameters, for example
lrn("regr.cv_glmnet", s = "lambda.min")
. ml_g
refers to the nuisance function .
Note: The learner
ml_g
is only required for the score 'IV-type'
.
Optionally, it can be specified and estimated for callable scores.
n_folds
(integer(1)
)
Number of folds. Default is 5
.
n_rep
(integer(1)
)
Number of repetitions for the sample splitting. Default is 1
.
score
(character(1)
, function()
)
A character(1)
("partialling out"
or "IV-type"
) or a function()
specifying the score function.
If a function()
is provided, it must be of the form
function(y, d, l_hat, m_hat, g_hat, smpls)
and
the returned output must be a named list()
with elements psi_a
and
psi_b
. Default is "partialling out"
.
dml_procedure
(character(1)
)
A character(1)
("dml1"
or "dml2"
) specifying the double machine
learning algorithm. Default is "dml2"
.
draw_sample_splitting
(logical(1)
)
Indicates whether the sample splitting should be drawn during
initialization of the object. Default is TRUE
.
apply_cross_fitting
(logical(1)
)
Indicates whether cross-fitting should be applied. Default is TRUE
.
set_ml_nuisance_params()
Set hyperparameters for the nuisance models of DoubleML models.
Note that in the current implementation, either all parameters have to be set globally or all parameters have to be provided fold-specific.
DoubleMLPLR$set_ml_nuisance_params( learner = NULL, treat_var = NULL, params, set_fold_specific = FALSE )
learner
(character(1)
)
The nuisance model/learner (see method params_names
).
treat_var
(character(1)
)
The treatment varaible (hyperparameters can be set treatment-variable
specific).
params
(named list()
)
A named list()
with estimator parameters. Parameters are used for all
folds by default. Alternatively, parameters can be passed in a
fold-specific way if option fold_specific
is TRUE
. In this case, the
outer list needs to be of length n_rep
and the inner list of length
n_folds
.
set_fold_specific
(logical(1)
)
Indicates if the parameters passed in params
should be passed in
fold-specific way. Default is FALSE
. If TRUE
, the outer list needs
to be of length n_rep
and the inner list of length n_folds
.
Note that in the current implementation, either all parameters have to
be set globally or all parameters have to be provided fold-specific.
self
tune()
Hyperparameter-tuning for DoubleML models.
The hyperparameter-tuning is performed using the tuning methods provided in the mlr3tuning package. For more information on tuning in mlr3, we refer to the section on parameter tuning in the mlr3 book.
DoubleMLPLR$tune( param_set, tune_settings = list(n_folds_tune = 5, rsmp_tune = mlr3::rsmp("cv", folds = 5), measure = NULL, terminator = mlr3tuning::trm("evals", n_evals = 20), algorithm = mlr3tuning::tnr("grid_search"), resolution = 5), tune_on_folds = FALSE )
param_set
(named list()
)
A named list
with a parameter grid for each nuisance model/learner
(see method learner_names()
). The parameter grid must be an object of
class ParamSet.
tune_settings
(named list()
)
A named list()
with arguments passed to the hyperparameter-tuning with
mlr3tuning to set up
TuningInstance objects.
tune_settings
has entries
terminator
(Terminator)
A Terminator object. Specification of terminator
is required to perform tuning.
algorithm
(Tuner or character(1)
)
A Tuner object (recommended) or key passed to the
respective dictionary to specify the tuning algorithm used in
tnr(). algorithm
is passed as an argument to
tnr(). If algorithm
is not specified by the users,
default is set to "grid_search"
. If set to "grid_search"
, then
additional argument "resolution"
is required.
rsmp_tune
(Resampling or character(1)
)
A Resampling object (recommended) or option passed
to rsmp() to initialize a
Resampling for parameter tuning in mlr3
.
If not specified by the user, default is set to "cv"
(cross-validation).
n_folds_tune
(integer(1)
, optional)
If rsmp_tune = "cv"
, number of folds used for cross-validation.
If not specified by the user, default is set to 5
.
measure
(NULL
, named list()
, optional)
Named list containing the measures used for parameter tuning. Entries in
list must either be Measure objects or keys to be
passed to passed to msr(). The names of the entries must
match the learner names (see method learner_names()
). If set to NULL
,
default measures are used, i.e., "regr.mse"
for continuous outcome
variables and "classif.ce"
for binary outcomes.
resolution
(character(1)
)
The key passed to the respective
dictionary to specify the tuning algorithm used in
tnr(). resolution
is passed as an argument to
tnr().
tune_on_folds
(logical(1)
)
Indicates whether the tuning should be done fold-specific or globally.
Default is FALSE
.
self
clone()
The objects of this class are cloneable with this method.
DoubleMLPLR$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other DoubleML:
DoubleML
,
DoubleMLIIVM
,
DoubleMLIRM
,
DoubleMLPLIV
library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(2) ml_g = lrn("regr.ranger", num.trees = 10, max.depth = 2) ml_m = ml_g$clone() obj_dml_data = make_plr_CCDDHNR2018(alpha = 0.5) dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m) dml_plr_obj$fit() dml_plr_obj$summary() ## Not run: library(DoubleML) library(mlr3) library(mlr3learners) library(mlr3tuning) library(data.table) set.seed(2) ml_l = lrn("regr.rpart") ml_m = ml_l$clone() obj_dml_data = make_plr_CCDDHNR2018(alpha = 0.5) dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m) param_grid = list( "ml_l" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_m" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2))) # minimum requirements for tune_settings tune_settings = list( terminator = mlr3tuning::trm("evals", n_evals = 5), algorithm = mlr3tuning::tnr("grid_search", resolution = 5)) dml_plr_obj$tune(param_set = param_grid, tune_settings = tune_settings) dml_plr_obj$fit() dml_plr_obj$summary() ## End(Not run)
library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(2) ml_g = lrn("regr.ranger", num.trees = 10, max.depth = 2) ml_m = ml_g$clone() obj_dml_data = make_plr_CCDDHNR2018(alpha = 0.5) dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m) dml_plr_obj$fit() dml_plr_obj$summary() ## Not run: library(DoubleML) library(mlr3) library(mlr3learners) library(mlr3tuning) library(data.table) set.seed(2) ml_l = lrn("regr.rpart") ml_m = ml_l$clone() obj_dml_data = make_plr_CCDDHNR2018(alpha = 0.5) dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m) param_grid = list( "ml_l" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2)), "ml_m" = paradox::ps( cp = paradox::p_dbl(lower = 0.01, upper = 0.02), minsplit = paradox::p_int(lower = 1, upper = 2))) # minimum requirements for tune_settings tune_settings = list( terminator = mlr3tuning::trm("evals", n_evals = 5), algorithm = mlr3tuning::tnr("grid_search", resolution = 5)) dml_plr_obj$tune(param_set = param_grid, tune_settings = tune_settings) dml_plr_obj$fit() dml_plr_obj$summary() ## End(Not run)
Preprocessed data set on financial wealth and 401(k) plan participation. The raw data files are preprocessed to reproduce the examples in Chernozhukov et al. (2020). An internet connection is required to sucessfully download the data set.
fetch_401k( return_type = "DoubleMLData", polynomial_features = FALSE, instrument = FALSE )
fetch_401k( return_type = "DoubleMLData", polynomial_features = FALSE, instrument = FALSE )
return_type |
( |
polynomial_features |
( |
instrument |
( |
Variable description, based on the supplementary material of Chernozhukov et al. (2020):
net_tfa: net total financial assets
e401: = 1 if employer offers 401(k)
p401: = 1 if individual participates in a 401(k) plan
age: age
inc: income
fsize: family size
educ: years of education
db: = 1 if individual has defined benefit pension
marr: = 1 if married
twoearn: = 1 if two-earner household
pira: = 1 if individual participates in IRA plan
hown: = 1 if home owner
The supplementary data of the study by Chernozhukov et al. (2018) is available at https://academic.oup.com/ectj/article/21/1/C1/5056401#supplementary-data.
A data object according to the choice of return_type
.
Abadie, A. (2003), Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics, 113(2): 231-263.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21: C1-C68. doi:10.1111/ectj.12097.
Preprocessed data set on the Pennsylvania Reemploymnent Bonus experiment. The raw data files are preprocessed to reproduce the examples in Chernozhukov et al. (2020). An internet connection is required to sucessfully download the data set.
fetch_bonus(return_type = "DoubleMLData", polynomial_features = FALSE)
fetch_bonus(return_type = "DoubleMLData", polynomial_features = FALSE)
return_type |
( |
polynomial_features |
( |
Variable description, based on the supplementary material of Chernozhukov et al. (2020):
abdt: chronological time of enrollment of each claimant in the Pennsylvania reemployment bonus experiment.
tg: indicates the treatment group (bonus amount - qualification period) of each claimant.
inuidur1: a measure of length (in weeks) of the first spell of unemployment
inuidur2: a second measure for the length (in weeks) of
female: dummy variable; it indicates if the claimant's sex is female (=1) or male (=0).
black: dummy variable; it indicates a person of black race (=1).
hispanic: dummy variable; it indicates a person of hispanic race (=1).
othrace: dummy variable; it indicates a non-white, non-black, not-hispanic person (=1).
dep1: dummy variable; indicates if the number of dependents of each claimant is equal to 1 (=1).
dep2: dummy variable; indicates if the number of dependents of each claimant is equal to 2 (=1).
q1-q6: six dummy variables indicating the quarter of experiment during which each claimant enrolled.
recall: takes the value of 1 if the claimant answered “yes” when was asked if he/she had any expectation to be recalled
agelt35: takes the value of 1 if the claimant's age is less than 35 and 0 otherwise.
agegt54: takes the value of 1 if the claimant's age is more than 54 and 0 otherwise.
durable: it takes the value of 1 if the occupation of the claimant was in the sector of durable manufacturing and 0 otherwise.
nondurable: it takes the value of 1 if the occupation of the claimant was in the sector of nondurable manufacturing and 0 otherwise.
lusd: it takes the value of 1 if the claimant filed in Coatesville, Reading, or Lancaster and 0 otherwise.
These three sites were considered to be located in areas characterized by low unemployment rate and short duration of unemployment.
husd: it takes the value of 1 if the claimant filed in Lewistown, Pittston, or Scranton and 0 otherwise.
These three sites were considered to be located in areas characterized by high unemployment rate and short duration of unemployment.
muld: it takes the value of 1 if the claimant filed in Philadelphia-North, Philadelphia-Uptown, McKeesport, Erie, or Butler and 0 otherwise.
These three sites were considered to be located in areas characterized by moderate unemployment rate and long duration of unemployment."
The supplementary data of the study by Chernozhukov et al. (2018) is available at https://academic.oup.com/ectj/article/21/1/C1/5056401#supplementary-data.
The supplementary data of the study by Bilias (2000) is available at https://www.journaldata.zbw.eu/dataset/sequential-testing-of-duration-data-the-case-of-the-pennsylvania-reemployment-bonus-experiment.
A data object according to the choice of return_type
.
Bilias Y. (2000), Sequential Testing of Duration Data: The Case of Pennsylvania ‘Reemployment Bonus’ Experiment. Journal of Applied Econometrics, 15(6): 575-594.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21: C1-C68. doi:10.1111/ectj.12097.
library(DoubleML) df_bonus = fetch_bonus(return_type = "data.table") obj_dml_data_bonus = DoubleMLData$new(df_bonus, y_col = "inuidur1", d_cols = "tg", x_cols = c( "female", "black", "othrace", "dep1", "dep2", "q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54", "durable", "lusd", "husd" ) ) obj_dml_data_bonus
library(DoubleML) df_bonus = fetch_bonus(return_type = "data.table") obj_dml_data_bonus = DoubleMLData$new(df_bonus, y_col = "inuidur1", d_cols = "tg", x_cols = c( "female", "black", "othrace", "dep1", "dep2", "q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54", "durable", "lusd", "husd" ) ) obj_dml_data_bonus
Generates data from a interactive IV regression (IIVM) model. The data generating process is defined as
and
The covariates :, where
is a matrix with entries
and
is a
dim_x
-vector with
entries .
The data generating process is inspired by a process used in the simulation experiment of Farbmacher, Gruber and Klaaßen (2020).
make_iivm_data( n_obs = 500, dim_x = 20, theta = 1, alpha_x = 0.2, return_type = "DoubleMLData" )
make_iivm_data( n_obs = 500, dim_x = 20, theta = 1, alpha_x = 0.2, return_type = "DoubleMLData" )
n_obs |
( |
dim_x |
( |
theta |
( |
alpha_x |
( |
return_type |
( |
Farbmacher, H., Guber, R. and Klaaßen, S. (2020). Instrument Validity Tests with Causal Forests. MEA Discussion Paper No. 13-2020. Available at SSRN:doi:10.2139/ssrn.3619201.
Generates data from a interactive regression (IRM) model. The data generating process is defined as
with ,
and covariates
, where
is a matrix with entries
.
is a
dim_x
-vector with entries
and the constancts
and
are given by
The data generating process is inspired by a process used in the simulation experiment (see Appendix P) of Belloni et al. (2017).
make_irm_data( n_obs = 500, dim_x = 20, theta = 0, R2_d = 0.5, R2_y = 0.5, return_type = "DoubleMLData" )
make_irm_data( n_obs = 500, dim_x = 20, theta = 0, R2_d = 0.5, R2_y = 0.5, return_type = "DoubleMLData" )
n_obs |
( |
dim_x |
( |
theta |
( |
R2_d |
( |
R2_y |
( |
return_type |
( |
Belloni, A., Chernozhukov, V., Fernández-Val, I. and Hansen, C. (2017). Program Evaluation and Causal Inference With High-Dimensional Data. Econometrica, 85: 233-298.
Generates data from a partially linear IV regression model used in Chernozhukov, Hansen and Spindler (2015). The data generating process is defined as
with
where is a
matrix with entries
and
is the
identity matrix.
iis a
-vector with entries
,
is a
-vector with
entries
and
.
make_pliv_CHS2015( n_obs, alpha = 1, dim_x = 200, dim_z = 150, return_type = "DoubleMLData" )
make_pliv_CHS2015( n_obs, alpha = 1, dim_x = 200, dim_z = 150, return_type = "DoubleMLData" )
n_obs |
( |
alpha |
( |
dim_x |
( |
dim_z |
( |
return_type |
( |
A data object according to the choice of return_type
.
Chernozhukov, V., Hansen, C. and Spindler, M. (2015), Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments. American Economic Review: Papers and Proceedings, 105 (5): 486-90.
Generates data from a partially linear IV regression model with multiway cluster sample used in Chiang et al. (2021). The data generating process is defined as
with
and
where
is a
matrix with entries
.
Further
and .
make_pliv_multiway_cluster_CKMS2021( N = 25, M = 25, dim_X = 100, theta = 1, return_type = "DoubleMLClusterData", ... )
make_pliv_multiway_cluster_CKMS2021( N = 25, M = 25, dim_X = 100, theta = 1, return_type = "DoubleMLClusterData", ... )
N |
( |
M |
( |
dim_X |
( |
theta |
( |
return_type |
( |
... |
Additional keyword arguments to set non-default values for the parameters
|
A data object according to the choice of return_type
.
Chiang, H. D., Kato K., Ma, Y. and Sasaki, Y. (2021), Multiway Cluster Robust Double/Debiased Machine Learning, Journal of Business & Economic Statistics, doi:10.1080/07350015.2021.1895815, https://arxiv.org/abs/1909.03489.
Generates data from a partially linear regression model used in Chernozhukov et al. (2018) for Figure 1. The data generating process is defined as
with and
.
The covariates are distributed as
,
where
is a matrix with entries
.
The nuisance functions are given by
with ,
,
,
,
,
.
make_plr_CCDDHNR2018( n_obs = 500, dim_x = 20, alpha = 0.5, return_type = "DoubleMLData" )
make_plr_CCDDHNR2018( n_obs = 500, dim_x = 20, alpha = 0.5, return_type = "DoubleMLData" )
n_obs |
( |
dim_x |
( |
alpha |
( |
return_type |
( |
A data object according to the choice of return_type
.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21: C1-C68. doi:10.1111/ectj.12097.
Generates data from a partially linear regression model used in a blog article by Turrell (2018). The data generating process is defined as
with ,
, and
covariates
, where
is a random symmetric, positive-definite matrix generated with
clusterGeneration::genPositiveDefMat()
. is a vector with entries
and the nuisance functions are given by
make_plr_turrell2018( n_obs = 100, dim_x = 20, theta = 0.5, return_type = "DoubleMLData", nu = 0, gamma = 1 )
make_plr_turrell2018( n_obs = 100, dim_x = 20, theta = 0.5, return_type = "DoubleMLData", nu = 0, gamma = 1 )
n_obs |
( |
dim_x |
( |
theta |
( |
return_type |
( |
nu |
( |
gamma |
( |
A data object according to the choice of return_type
.
Turrell, A. (2018), Econometrics in Python part I - Double machine learning, Markov Wanderer: A blog on economics, science, coding and data. https://aeturrell.com/blog/posts/econometrics-in-python-parti-ml/.