Create and Inference Bayesian Networks using Pgmpy with Example

05 May 2019 | 4 minutes read Code Python Notebook

In this quick notebook, we will be discussing Bayesian Statisitcs over Bayesian Networks and Inferencing them using Pgmpy Python library. Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event, which can change as new information is gathered, rather than a fixed value based upon frequency or propensity.Bayesian statistical methods use Bayes’ theorem to compute and update probabilities after obtaining new data. Bayes’ theorem describes the conditional probability of an event based on data as well as prior information or beliefs about the event or conditions related to the event.

Bayes’ theorem

Bayes’ theorem is a fundamental theorem in Bayesian statistics, as it is used by Bayesian methods to update probabilities, which are degrees of belief, after obtaining new data. Given two events A and B, the conditional probability of A given that B is true is expressed as follows:

The probability of the evidence P(B) can be calculated using the law of total probability. If is a partition of the sample space, which is the set of all outcomes of an experiment, then:

Bayesian network

A Bayesian network is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor.

Example

We can use bayes rule and total probability theorem to infer probabilites in a bayes network. Lets take an example:

Example Lets consider an example, where a social media website wish to moderate content on the site and suspends bad user accounts. For this they would like us to create a statistical moderator that can take the preemtive measure based on information given. Lets assume we have following information:

M : A prediction from a ML model that can read the content and give a score (probability) that this content should be flagged.
U : Another User flags the content.
B : The account was suspended before for any bad content.
R : Score (Probability) that the content should be removed from the platform.
S : Score (Probability) that account should be suspended

Lets assume probabilities are given to us for the network as follows:

Lets create this bayes network in python using pgmpy library https://github.com/pgmpy/pgmpy .

!pip install pgmpy

Requirement already satisfied: pgmpy in /anaconda3/lib/python3.7/site-packages (0.1.7)
Requirement already satisfied: scipy>=1.0.0 in /anaconda3/lib/python3.7/site-packages (from pgmpy) (1.2.1)
Requirement already satisfied: networkx<1.12,>=1.11 in /anaconda3/lib/python3.7/site-packages (from pgmpy) (1.11)
Requirement already satisfied: numpy>=1.14.0 in /anaconda3/lib/python3.7/site-packages (from pgmpy) (1.16.2)
Requirement already satisfied: decorator>=3.4.0 in /anaconda3/lib/python3.7/site-packages (from networkx<1.12,>=1.11->pgmpy) (4.4.0)

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination
import numpy as np

bayesNet = BayesianModel()
bayesNet.add_node("M")
bayesNet.add_node("U")
bayesNet.add_node("R")
bayesNet.add_node("B")
bayesNet.add_node("S")

bayesNet.add_edge("M", "R")
bayesNet.add_edge("U", "R")
bayesNet.add_edge("B", "R")
bayesNet.add_edge("B", "S")
bayesNet.add_edge("R", "S")

Adding CPDs for each node. A quick note is that while adding proabilities, we have to give FALSE values first.

cpd_A = TabularCPD('M', 2, values=[[.95], [.05]])
cpd_U = TabularCPD('U', 2, values=[[.85], [.15]])
cpd_H = TabularCPD('B', 2, values=[[.90], [.10]])

cpd_S = TabularCPD('S', 2, values=[[0.98, .88, .95, .6], [.02, .12, .05, .40]],
                   evidence=['R', 'B'], evidence_card=[2, 2])

cpd_R = TabularCPD('R', 2,
                   values=[[0.96, .86, .94, .82, .24, .15, .10, .05], [.04, .14, .06, .18, .76, .85, .90, .95]],
                   evidence=['M', 'B', 'U'], evidence_card=[2, 2,2])
bayesNet.add_cpds(cpd_A, cpd_U, cpd_H, cpd_S, cpd_R)

Checking if model is correctly added.

bayesNet.check_model()
print("Model is correct.")

Model is correct.

Creating solver that uses variable elimination internally for inference.

solver = VariableElimination(bayesNet)

Lets take some examples. For cross verification, we will be doing inference manually also using Bayes Theorem and Total Probability theorem.

1. Lets find proability of “Content should be removed from the platform”**

P(R) 
=P(R|MBU)*P(M)*P(B)*P(U)+P(R|MBU)*P(M)*P(B)*P(!U)+P(R|MBU)*P(M)*P(!B)*P(U)
+P(R|MBU)*P(M)*P(!B)*P(!U)+P(R|MBU)*P(!M)*P(B)*P(U)+P(R|MBU)*P(!M)*P(B)*P(!U)
+P(R|MBU)*P(!M)*P(!B)*P(U)+P(R|MBU)*P(!M)*P(!B)*P(!U) --- [Using total probability theorem as R depends on M, B, U]
=0.95*0.05*0.1*0.15+0.9*0.05*0.1*0.85+0.85*0.05*0.9*0.15
+0.76*0.05*0.9*0.85+0.18*0.95*0.1*0.15+0.06*0.95*0.1*0.85
+0.14*0.95*0.9*0.15+0.04*0.95*0.9*0.85
=0.09378

Using pgmpy library:

result = solver.query(variables=['R'])
print("R", result['R'].values[1])

R 0.09378000000000002

2. Lets find probability of “Content should be removed from platform given our ML model flags it”

P(R|A) 
= P(R|AHU) * P(H) * P(U) + P (R|AH!U) * P(H) *P(!U) + P(R|A!HU) 
* P(!H) * P(U)+ P(R|A!H!U) * P(!H) * P(!U)                      -------- [ Using Total Probability theorem ]
=0.95*0.1*0.15 + 0.9*0.1*0.85 +0.85*0.9*0.15 + 0.76*0.9*0.85
=0.7869

Now, Using pgmpy libary:

result = solver.query(variables=['R'], evidence={'M': 1})
print("R| M", result['R'].values[1])

R| M 0.7869

Pgmpy can also find complex proability inference considering dependent and independent variable considering something is given.

For example, we can find “Account should be suspended given it was suspened before”

result = solver.query(variables=['S'], evidence={'B': 1})
print("S| B", result['S'].values[1])

S| B 0.15345299999999998

Model has other features such as it can also find dependencies between the variables. Example:

bayesNet.get_independencies()

(M _|_ U, B)
(M _|_ B | U)
(M _|_ U | B)
(M _|_ S | R, B)
(M _|_ S | R, U, B)
(U _|_ M, B)
(U _|_ B | M)
(U _|_ M | B)
(U _|_ S | R, B)
(U _|_ S | M, R, B)
(B _|_ M, U)
(B _|_ U | M)
(B _|_ M | U)
(S _|_ M, U | R, B)
(S _|_ U | M, R, B)
(S _|_ M | R, U, B)

print("Completed.")

Completed.

Anmol Kapoor

Create and Inference Bayesian Networks using Pgmpy with Example

Bayes’ theorem

Bayesian network

Example

1. Lets find proability of “Content should be removed from the platform”**

2. Lets find probability of “Content should be removed from platform given our ML model flags it”

Pgmpy can also find complex proability inference considering dependent and independent variable considering something is given.

For example, we can find “Account should be suspended given it was suspened before”

Model has other features such as it can also find dependencies between the variables. Example:

About me

Engineering Manager @Flipkart (India) and pursuing Masters in CS [ML & AI] @Georgia Tech (Atlanta)

Anmol Kapoor

Create and Inference Bayesian Networks using Pgmpy with Example

Bayes’ theorem

Bayesian network

Example

Statistical moderator for social platform with a given information such as user history, ML model prediction, other user flagging the content, etc

1. Lets find proability of “Content should be removed from the platform”**

2. Lets find probability of “Content should be removed from platform given our ML model flags it”

Pgmpy can also find complex proability inference considering dependent and independent variable considering something is given.

For example, we can find “Account should be suspended given it was suspened before”

Model has other features such as it can also find dependencies between the variables. Example:

About me

Engineering Manager @Flipkart (India) and pursuing Masters in CS [ML & AI] @Georgia Tech (Atlanta)

Related Posts

#Notes: Proving HITTING SET problem as NP Complete 24 Sep 2020

#Notes: Proving Exact 4-SAT problem as NP Complete 24 Sep 2020

#Notes: Proving CLIQUE-IS problems as NP Complete 22 Sep 2020