readwrite module pgmpy¶

pgmpy is a python library for creation, manipulation and implementation of Probabilistic graph models.There are various standard file formats for representing PGM data. PGM data basically consists of graph,a table corresponding to each node and a few other attributes of a graph.
pgmpy has a functionality to read networks from and write networks to these standard file formats.Currently pgmpy supports 5 file formats ProbModelXML, PomDPX, XMLBIF, XMLBeliefNetwork and UAI file formats.Using these modules, models can be specified in a uniform file format and readily converted to bayesian or markov model objects.
Now, Let's read a ProbModel XML File and get the corresponding model instance of the probmodel.

In [1]:

from pgmpy.readwrite import ProbModelXMLReader

In [2]:

reader_string = ProbModelXMLReader('example.pgmx')

Now to get the corresponding model instance we need get_model()

In [3]:

model = reader_string.get_model()

Now we can query this model accoring to our requirements.It is an instance of BayesianModel or MarkovModel depending on the type of the model which is given.
Suppose we want to know all the nodes in the given model, we can use

In [4]:

print(model.nodes())

['Smoker', 'X-ray', 'VisitToAsia', 'Tuberculosis', 'TuberculosisOrCancer', 'LungCancer', 'Dyspnea', 'Bronchitis']

To get all the edges use model.edges() method.

In [5]:

model.edges()

Out[5]:

[('Smoker', 'LungCancer'),
 ('Smoker', 'Bronchitis'),
 ('VisitToAsia', 'Tuberculosis'),
 ('Tuberculosis', 'TuberculosisOrCancer'),
 ('TuberculosisOrCancer', 'Dyspnea'),
 ('TuberculosisOrCancer', 'X-ray'),
 ('LungCancer', 'TuberculosisOrCancer'),
 ('Bronchitis', 'Dyspnea')]

To get all the cpds of the given model we can use model.get_cpds() and to get the corresponding values we can iterate over each cpd and call the corresponding get_cpd() method.

In [6]:

cpds = model.get_cpds()
for cpd in cpds:
    print(cpd.get_cpd())

[[ 0.95  0.05]
 [ 0.02  0.98]]
[[ 0.7  0.3]
 [ 0.4  0.6]]
[[ 0.9  0.1  0.3  0.7]
 [ 0.2  0.8  0.1  0.9]]
[[ 0.99]
 [ 0.01]]
[[ 0.5]
 [ 0.5]]
[[ 0.99  0.01]
 [ 0.9   0.1 ]]
[[ 0.99  0.01]
 [ 0.95  0.05]]
[[ 1.  0.  0.  1.]
 [ 0.  1.  0.  1.]]

pgmpy not only allows us to read from the specific file format but also helps us to write the given model into the specific file format. Let's write a sample model into Probmodel XML file.
For that first define our data for the model.

In [7]:

import numpy as np

edges_list = [('VisitToAsia', 'Tuberculosis'),
              ('LungCancer', 'TuberculosisOrCancer'),
              ('Smoker', 'LungCancer'),
              ('Smoker', 'Bronchitis'),
              ('Tuberculosis', 'TuberculosisOrCancer'),
              ('Bronchitis', 'Dyspnea'),
              ('TuberculosisOrCancer', 'Dyspnea'),
              ('TuberculosisOrCancer', 'X-ray')]
nodes = {'Smoker': {'States': {'no': {}, 'yes': {}},
                    'role': 'chance',
                    'type': 'finiteStates',
                    'Coordinates': {'y': '52', 'x': '568'},
                    'AdditionalProperties': {'Title': 'S', 'Relevance': '7.0'}},
         'Bronchitis': {'States': {'no': {}, 'yes': {}},
                        'role': 'chance',
                        'type': 'finiteStates',
                        'Coordinates': {'y': '181', 'x': '698'},
                        'AdditionalProperties': {'Title': 'B', 'Relevance': '7.0'}},
         'VisitToAsia': {'States': {'no': {}, 'yes': {}},
                         'role': 'chance',
                         'type': 'finiteStates',
                         'Coordinates': {'y': '58', 'x': '290'},
                         'AdditionalProperties': {'Title': 'A', 'Relevance': '7.0'}},
         'Tuberculosis': {'States': {'no': {}, 'yes': {}},
                          'role': 'chance',
                          'type': 'finiteStates',
                          'Coordinates': {'y': '150', 'x': '201'},
                          'AdditionalProperties': {'Title': 'T', 'Relevance': '7.0'}},
         'X-ray': {'States': {'no': {}, 'yes': {}},
                   'role': 'chance',
                   'AdditionalProperties': {'Title': 'X', 'Relevance': '7.0'},
                   'Coordinates': {'y': '322', 'x': '252'},
                   'Comment': 'Indica si el test de rayos X ha sido positivo',
                   'type': 'finiteStates'},
         'Dyspnea': {'States': {'no': {}, 'yes': {}},
                     'role': 'chance',
                     'type': 'finiteStates',
                     'Coordinates': {'y': '321', 'x': '533'},
                     'AdditionalProperties': {'Title': 'D', 'Relevance': '7.0'}},
         'TuberculosisOrCancer': {'States': {'no': {}, 'yes': {}},
                                  'role': 'chance',
                                  'type': 'finiteStates',
                                  'Coordinates': {'y': '238', 'x': '336'},
                                  'AdditionalProperties': {'Title': 'E', 'Relevance': '7.0'}},
         'LungCancer': {'States': {'no': {}, 'yes': {}},
                        'role': 'chance',
                        'type': 'finiteStates',
                        'Coordinates': {'y': '152', 'x': '421'},
                        'AdditionalProperties': {'Title': 'L', 'Relevance': '7.0'}}}
edges = {'LungCancer': {'TuberculosisOrCancer': {'directed': 'true'}},
         'Smoker': {'LungCancer': {'directed': 'true'},
                    'Bronchitis': {'directed': 'true'}},
         'Dyspnea': {},
         'X-ray': {},
         'VisitToAsia': {'Tuberculosis': {'directed': 'true'}},
         'TuberculosisOrCancer': {'X-ray': {'directed': 'true'},
                                  'Dyspnea': {'directed': 'true'}},
         'Bronchitis': {'Dyspnea': {'directed': 'true'}},
         'Tuberculosis': {'TuberculosisOrCancer': {'directed': 'true'}}}

cpds = [{'Values': np.array([[0.95, 0.05], [0.02, 0.98]]),
         'Variables': {'X-ray': ['TuberculosisOrCancer']}},
        {'Values': np.array([[0.7, 0.3], [0.4,  0.6]]),
         'Variables': {'Bronchitis': ['Smoker']}},
        {'Values':  np.array([[0.9, 0.1,  0.3,  0.7], [0.2,  0.8,  0.1,  0.9]]),
         'Variables': {'Dyspnea': ['TuberculosisOrCancer', 'Bronchitis']}},
        {'Values': np.array([[0.99], [0.01]]),
         'Variables': {'VisitToAsia': []}},
        {'Values': np.array([[0.5], [0.5]]),
         'Variables': {'Smoker': []}},
        {'Values': np.array([[0.99, 0.01], [0.9, 0.1]]),
         'Variables': {'LungCancer': ['Smoker']}},
        {'Values': np.array([[0.99, 0.01], [0.95, 0.05]]),
         'Variables': {'Tuberculosis': ['VisitToAsia']}},
        {'Values': np.array([[1, 0, 0, 1], [0, 1, 0, 1]]),
         'Variables': {'TuberculosisOrCancer': ['LungCancer', 'Tuberculosis']}}]

Now let's create a model from the given data.

In [8]:

from pgmpy.models import BayesianModel
from pgmpy.factors import TabularCPD

model = BayesianModel(edges_list)

for node in nodes:
    model.node[node] = nodes[node]
for edge in edges:
    model.edge[edge] = edges[edge]

tabular_cpds = []
for cpd in cpds:
    var = list(cpd['Variables'].keys())[0]
    evidence = cpd['Variables'][var]
    values = cpd['Values']
    states = len(nodes[var]['States'])
    evidence_card = [len(nodes[evidence_var]['States'])
                     for evidence_var in evidence]
    tabular_cpds.append(
        TabularCPD(var, states, values, evidence, evidence_card))

model.add_cpds(*tabular_cpds)

In [9]:

from pgmpy.readwrite import ProbModelXMLWriter, get_probmodel_data

To get the data which we need to give to the ProbModelXMLWriter to get the corresponding fileformat we need to use the method get_probmodel_data. This method is only specific to ProbModelXML file, for other file formats we would directly pass the model to the given Writer Class.

In [10]:

model_data = get_probmodel_data(model)
writer = ProbModelXMLWriter(model_data=model_data)
print(writer)

To write the xml data into the file we can use the method write_file of the given Writer class.

In [ ]:

writer.write_file('probmodelxml.pgmx')

General WorkFlow of the readwrite module¶

pgmpy.readwrite.[fileformat]reader is base class for reading the given file format. Replace file fomat with the desired fileforamt from which you want to read the file.In this base class there are different methods defined to parse the given file.For example for XMLBelief Network various methods which are defined are as follows.

In [4]:

from pgmpy.readwrite.XMLBeliefNetwork import XBNReader
reader = XBNReader('xmlbelief.xml')

get_analysisnotebook_values: It returns a dictionary of the attributes of analysisnotebook tag.

In [5]:

reader.get_analysisnotebook_values()

Out[5]:

{'NAME': 'Notebook.Cancer Example From Neapolitan', 'ROOT': 'Cancer'}

get_bnmodel_name: It returns the name of the bnmodel.

In [6]:

reader.get_bnmodel_name()

Out[6]:

'Cancer'

get_static_properties: It returns the dictionary of staticproperties.

In [7]:

reader.get_static_properties()

Out[7]:

{'CREATOR': 'Microsoft Research DTAS',
 'FORMAT': 'MSR DTAS XML',
 'VERSION': '0.2'}

get_variables: It returns the list of variables.

In [8]:

reader.get_variables()

Out[8]:

{'a': {'DESCRIPTION': '(a) Metastatic Cancer',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '13495',
  'YPOS': '10465'},
 'b': {'DESCRIPTION': '(b) Serum Calcium Increase',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '11290',
  'YPOS': '11965'},
 'c': {'DESCRIPTION': '(c) Brain Tumor',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '15250',
  'YPOS': '11935'},
 'd': {'DESCRIPTION': '(d) Coma',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '13960',
  'YPOS': '12985'},
 'e': {'DESCRIPTION': '(e) Papilledema',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '17305',
  'YPOS': '13240'}}

get_edges: It returs the list of tuples.Each tuple containes two elements (parent, child) for each edge.

In [9]:

reader.get_edges()

Out[9]:

[('a', 'b'), ('a', 'c'), ('b', 'd'), ('c', 'd'), ('c', 'e')]

get_distributions: It returns a dictionary of name and it's distributions.

In [10]:

reader.get_distributions()

Out[10]:

{'a': {'DPIS': array([[ 0.2,  0.8]]), 'TYPE': 'discrete'},
 'b': {'CARDINALITY': array([2]),
  'CONDSET': ['a'],
  'DPIS': array([[ 0.8,  0.2],
         [ 0.2,  0.8]]),
  'TYPE': 'discrete'},
 'c': {'CARDINALITY': array([2]),
  'CONDSET': ['a'],
  'DPIS': array([[ 0.2 ,  0.8 ],
         [ 0.05,  0.95]]),
  'TYPE': 'discrete'},
 'd': {'CARDINALITY': array([2, 2]),
  'CONDSET': ['b', 'c'],
  'DPIS': array([[ 0.8 ,  0.2 ],
         [ 0.9 ,  0.1 ],
         [ 0.7 ,  0.3 ],
         [ 0.05,  0.95]]),
  'TYPE': 'discrete'},
 'e': {'CARDINALITY': array([2]),
  'CONDSET': ['c'],
  'DPIS': array([[ 0.8,  0.2],
         [ 0.6,  0.4]]),
  'TYPE': 'discrete'}}

get_model: It returns an instance of the given model, for ex, BayesianModel in cases of XMLBelief format.

In [11]:

model = reader.get_model()
print(model.nodes())
print(model.edges())

['c', 'b', 'e', 'a', 'd']
[('c', 'e'), ('c', 'd'), ('b', 'd'), ('a', 'c'), ('a', 'b')]

pgmpy.readwrite.[fileformat]writer is base class for writing the model into the given file format.It takes a model as an argument which can be an instance of BayesianModel, MarkovModel. Replace file fomat with the desired fileforamt from which you want to read the file.In this base class there are different methods defined to set the contents of the new file to be created from the given model.For example for XMLBelief Network various methods which are defined are as follows.

In [7]:

from pgmpy.models import BayesianModel
from pgmpy.factors import TabularCPD
import numpy as np
nodes = {'c': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(c) Brain Tumor',
               'YPOS': '11935',
               'XPOS': '15250',
               'TYPE': 'discrete'},
         'a': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(a) Metastatic Cancer',
               'YPOS': '10465',
               'XPOS': '13495',
               'TYPE': 'discrete'},
         'b': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(b) Serum Calcium Increase',
               'YPOS': '11965',
               'XPOS': '11290',
               'TYPE': 'discrete'},
         'e': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(e) Papilledema',
               'YPOS': '13240',
               'XPOS': '17305',
               'TYPE': 'discrete'},
         'd': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(d) Coma',
               'YPOS': '12985',
               'XPOS': '13960',
               'TYPE': 'discrete'}}
model = BayesianModel([('b', 'd'), ('a', 'b'), ('a', 'c'), ('c', 'd'), ('c', 'e')])
cpd_distribution = {'a': {'TYPE': 'discrete', 'DPIS': np.array([[0.2, 0.8]])},
                    'e': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
                                                                [0.6, 0.4]]), 'CONDSET': ['c'], 'CARDINALITY': [2]},
                    'b': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
                                                                [0.2, 0.8]]), 'CONDSET': ['a'], 'CARDINALITY': [2]},
                    'c': {'TYPE': 'discrete', 'DPIS': np.array([[0.2, 0.8],
                                                                [0.05, 0.95]]), 'CONDSET': ['a'], 'CARDINALITY': [2]},
                    'd': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
                                                                [0.9, 0.1],
                                                                [0.7, 0.3],
                                                                [0.05, 0.95]]), 'CONDSET': ['b', 'c'], 'CARDINALITY': [2, 2]}}

tabular_cpds = []
for var, values in cpd_distribution.items():
    evidence = values['CONDSET'] if 'CONDSET' in values else []
    cpd = values['DPIS']
    evidence_card = values['CARDINALITY'] if 'CARDINALITY' in values else []
    states = nodes[var]['STATES']
    cpd = TabularCPD(var, len(states), cpd,
                     evidence=evidence,
                     evidence_card=evidence_card)
    tabular_cpds.append(cpd)
model.add_cpds(*tabular_cpds)

for var, properties in nodes.items():
    model.node[var] = properties

In [8]:

from pgmpy.readwrite.XMLBeliefNetwork import XBNWriter
writer = XBNWriter(model = model)

set_analysisnotebook: It sets the attributes for ANALYSISNOTEBOOK tag.
set_bnmodel_name: It sets the name of the BNMODEL.
set_static_properties: It sets the STAICPROPERTIES tag for the network.
set_variables: It sets the VARIABLES tag for the network.
set_edges: It sets edges/arcs in the network.
set_distributions: It sets distributions in the network.

gsoc2015

Thursday 20 August 2015

Final Notebook

readwrite module pgmpy¶

General WorkFlow of the readwrite module¶

Wednesday 12 August 2015

UAI Reader And Wrirter

UAI Format Brief Description