CausalDiscoveryToolbox：因果建模、因果图代码实现

大家好，又见面了，我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。

Jetbrains全系列IDE使用 1年只要46元售后保障童叟无欺

文章目录

最近在分析观测数据的因果关系时，发现一个很好用的工具包——CausalDiscoveryToolbox（以下简称Cdt），功能齐全，轻松上手因果发现。
下面简单整理下该工具包的原理+用法。

CausalDiscoveryToolbox简介

[Github] [论文] [文档]

用于在从数据的联合概率分布样本中学习因果图和相关的因果机制。
实现了端到端的因果发现方法，支持从观测数据中恢复直接依赖关系（因果图的骨架）和变量之间的因果关系。
实现了许多用于图结构恢复的算法（包括来自bnlearn¹，pcalg²包的算法）。
基于Numpy，Scikit-learn，Pytorch和R开发。
支持GPU硬件加速。

因果建模的过程

Cdt工具包对一般的因果建模流程进行了概括：

观测数据

因果发现算法

图恢复算法

无向图

因果有向图

Cdt工具包可以直接从观测数据中进行因果发现（获得因果有向图），也可以先恢复图结构（获得无向依赖图）之后，再进行因果发现（获得因果有向图）。

图恢复算法 Graph recovery algorithms

在Cdt工具包中实现了两种从原始数据中恢复无向依赖图的方法：

二元依赖 (Bivariate dependencies)
用于确定因果图中的（无向）边。
依赖于统计检验，例如皮尔逊相关性（Pearson’s correlation）或互信息分数（mutual information scores）³。在第一阶段中可以使用二元依赖关系来建立因果图骨架。
多元方法 (Multivariate methods)
目的是恢复全因果图。即为图的所有变量选择父、子和配偶节点（子的父母）。
输出一个networkx.Graph图对象。

因果发现算法 Causal Discovery algorithms

Cdt包的主要焦点是从观测数据中发现因果关系，从成对设置到全图建模。

成对建模（The pairwise setting）
把每一对变量作为因果对问题（CEP⁴，cause-effect pair problem）进行处理，从而确定这些变量之间的因果关系。
比如识别X→Y还是Y→X，或者X与Y没有关系。因果对参考教程
成对建模的两个经典方法：加性噪声模型（ANM）⁵和IGCI⁶ (Information-Geometric Causal Inference)，在有篇因果对的综述里有详细介绍，后面有空梳理下。
成对建模的前提假设是：这两个变量已经受到其他协变量的制约，并且剩余的潜在协变量几乎没有影响，可以被认为是“噪声”。
全图建模（The graph setting）
基于贝叶斯或基于分数的方法，输出有向无环图或部分有向无环图。
①依赖于条件独立性测试，称为基于约束的方法，如PC⁷或FCI⁸。
②依赖于基于分数的方法，通过图搜索启发式方法（如GES⁹或CAM¹⁰）找到使似然分数最大化的图。
③利用著名的对抗生成网络，例如CGNN¹¹或SAM¹²。

安装Cdt工具包

直接使用pip安装：

pip install cdt

使用示例

import cdt
from cdt import SETTINGS
SETTINGS.verbose=True
#SETTINGS.NJOBS=16
#SETTINGS.GPU=1
import networkx as nx
import matplotlib.pyplot as plt
plt.axis('off')

# Load data
data = pd.read_csv("lucas0_train.csv")
print(data.head())

# Finding the structure of the graph
glasso = cdt.independence.graph.Glasso()
skeleton = glasso.predict(data)

# Pairwise setting
model = cdt.causality.pairwise.ANM()
output_graph = model.predict(data, skeleton)

# Visualize causality graph
options = { 
   
        "node_color": "#A0CBE2",
        "width": 1,
        "node_size":400,
        "edge_cmap": plt.cm.Blues,
        "with_labels": True,
    }
nx.draw_networkx(output_graph,**options)

上述代码输出的因果图：
在这里插入图片描述
示例数据：
https://download.csdn.net/download/Bit_Coders/16241408

网络结构的绘制参考networkx文档：
https://networkx.org/documentation/stable/index.html

工具包模块

该软件包分为5个模块：

因果关系：cdt.causality在成对设置或图形设置中实施因果发现算法。
独立性：cdt.independence包括恢复数据依赖关系图的方法。
数据：cdt.data为用户提供生成数据和加载基准数据的工具。
实用程序：cdt.utils为用户提供用于模型构建，图形实用程序和设置的工具。
指标：cdt.metrics包括图表的评分指标，以输入为准 networkx.DiGraph
用于计算的所有方法，接口与Scikit-learn类似，在这里.predict() 发动对给定的数据到工具箱中的算法，.fit()使得训练学习算法大部分的算法是类，它们的参数可以在自定义.init()中设置。

程序包及其算法的结构：

   cdt package
   |
   |- independence
   |  |- graph (Infering the skeleton from data)
   |  |  |- Lasso variants (Randomized Lasso[1], Glasso[2], HSICLasso[3])
   |  |  |- FSGNN (CGNN[12] variant for feature selection)
   |  |  |- Skeleton recovery using feature selection algorithms (RFECV[5], LinearSVR[6], RRelief[7], ARD[8,9], DecisionTree)
   |  |
   |  |- stats (pairwise methods for dependency)
   |     |- Correlation (Pearson, Spearman, KendallTau)
   |     |- Kernel based (NormalizedHSIC[10])
   |     |- Mutual information based (MIRegression, Adjusted Mutual Information[11], Normalized mutual information[11])
   |
   |- data
   |  |- CausalPairGenerator (Generate causal pairs)
   |  |- AcyclicGraphGenerator (Generate FCM-based graphs)
   |  |- load_dataset (load standard benchmark datasets)
   |
   |- causality
   |  |- graph (methods for graph inference)
   |  |  |- CGNN[12]
   |  |  |- PC[13]
   |  |  |- GES[13]
   |  |  |- GIES[13]
   |  |  |- LiNGAM[13]
   |  |  |- CAM[13]
   |  |  |- GS[23]
   |  |  |- IAMB[24]
   |  |  |- MMPC[25]
   |  |  |- SAM[26]
   |  |  |- CCDr[27]
   |  |
   |  |- pairwise (methods for pairwise inference)
   |     |- ANM[14] (Additive Noise Model)
   |     |- IGCI[15] (Information Geometric Causal Inference)
   |     |- RCC[16] (Randomized Causation Coefficient)
   |     |- NCC[17] (Neural Causation Coefficient)
   |     |- GNN[12] (Generative Neural Network -- Part of CGNN )
   |     |- Bivariate fit (Baseline method of regression)
   |     |- Jarfo[20]
   |     |- CDS[20]
   |     |- RECI[28]
   |
   |- metrics (Implements the metrics for graph scoring)
   |  |- Precision Recall
   |  |- SHD
   |  |- SID [29]
   |
   |- utils
      |- Settings -> SETTINGS class (hardware settings)
      |- loss -> MMD loss [21, 22] & various other loss functions
      |- io -> for importing data formats
      |- graph -> graph utilities

使用感想

接口简单，文档清晰，易于上手。
不过目前还不支持使用干预措施。
Cdt工具包是在观察环境进行因果发现的软件包，所以相当于还是在因果科学的第一层级“关联”。不过从当前的实际应用角度来说，“干预”和“反事实”的实施难度较大，“关联”层级的因果发现和推理已经能起到一定的作用。

Marco Scutari. Package ‘bnlearn’, 2018. ↩︎
Markus Kalisch, Alain Hauser, et al. Package ‘pcalg’. 2018. ↩︎
Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct):2837–2854, 2010. ↩︎
Isabelle Guyon. Chalearn cause effect pairs challenge, 2013. URL http://www.causality.inf.ethz.ch/cause-effect.php. ↩︎
Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch¨olkopf. Nonlinear causal discovery with additive noise models. In Neural Information Processing Systems (NIPS), pages 689–696, 2009. ↩︎
Janzing, D., Mooij, J., Zhang, K., Lemeire, et al… (2012). Information-geometric approach to inferring causal directions. Artificial Intelligence, 182, 1-31. ↩︎
Peter Spirtes, Clark N Glymour, and Richard Scheines. Causation, prediction, and search.MIT press, 2000 ↩︎
Eric V Strobl, Kun Zhang, and Shyam Visweswaran. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. 2017 ↩︎
David Maxwell Chickering. Optimal structure identification with greedy search. Journal
of machine learning research, 3(Nov):507–554, 2002. ↩︎
Peter B¨uhlmann, Jonas Peters, Jan Ernest, et al. Cam: Causal additive models, highdimensional order search and penalized regression. The Annals of Statistics, 2014. ↩︎
Olivier Goudet, Diviyan Kalainathan, et al. Learning functional causal models with generative neural networks. 2017. ↩︎
Diviyan Kalainathan, Olivier Goudet, et al. Sam: Structural agnostic model, causal discovery and penalized adversarial learning. 2018. ↩︎

发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/170417.html原文链接：https://javaforall.net

CausalDiscoveryToolbox：因果建模、因果图代码实现

文章目录

CausalDiscoveryToolbox简介

因果建模的过程

图恢复算法 Graph recovery algorithms

二元依赖 (Bivariate dependencies)

多元方法 (Multivariate methods)

因果发现算法 Causal Discovery algorithms

成对建模（The pairwise setting）

全图建模（The graph setting）

安装Cdt工具包

使用示例

工具包模块

使用感想

相关推荐

微信浏览器

逻辑运算符Python_逻辑运算符的优先级

pycharmmatplotlib装不上_pycharm没有matplotlib模块

sql 左连接数据出现重复[通俗易懂]

SQL Server 事务的使用

Bootstrap使用及环境搭建详解

发表回复