Thursday, 20 August 2015

Final Notebook

readwrite module pgmpy¶

pgmpy is a python library for creation, manipulation and implementation of Probabilistic graph models.There are various standard file formats for representing PGM data. PGM data basically consists of graph,a table corresponding to each node and a few other attributes of a graph.
pgmpy has a functionality to read networks from and write networks to these standard file formats.Currently pgmpy supports 5 file formats ProbModelXML, PomDPX, XMLBIF, XMLBeliefNetwork and UAI file formats.Using these modules, models can be specified in a uniform file format and readily converted to bayesian or markov model objects.
Now, Let's read a ProbModel XML File and get the corresponding model instance of the probmodel.

In [1]:

from pgmpy.readwrite import ProbModelXMLReader

In [2]:

reader_string = ProbModelXMLReader('example.pgmx')

Now to get the corresponding model instance we need get_model()

In [3]:

model = reader_string.get_model()

Now we can query this model accoring to our requirements.It is an instance of BayesianModel or MarkovModel depending on the type of the model which is given.
Suppose we want to know all the nodes in the given model, we can use

In [4]:

print(model.nodes())

['Smoker', 'X-ray', 'VisitToAsia', 'Tuberculosis', 'TuberculosisOrCancer', 'LungCancer', 'Dyspnea', 'Bronchitis']

To get all the edges use model.edges() method.

In [5]:

model.edges()

Out[5]:

[('Smoker', 'LungCancer'),
 ('Smoker', 'Bronchitis'),
 ('VisitToAsia', 'Tuberculosis'),
 ('Tuberculosis', 'TuberculosisOrCancer'),
 ('TuberculosisOrCancer', 'Dyspnea'),
 ('TuberculosisOrCancer', 'X-ray'),
 ('LungCancer', 'TuberculosisOrCancer'),
 ('Bronchitis', 'Dyspnea')]

To get all the cpds of the given model we can use model.get_cpds() and to get the corresponding values we can iterate over each cpd and call the corresponding get_cpd() method.

In [6]:

cpds = model.get_cpds()
for cpd in cpds:
    print(cpd.get_cpd())

[[ 0.95  0.05]
 [ 0.02  0.98]]
[[ 0.7  0.3]
 [ 0.4  0.6]]
[[ 0.9  0.1  0.3  0.7]
 [ 0.2  0.8  0.1  0.9]]
[[ 0.99]
 [ 0.01]]
[[ 0.5]
 [ 0.5]]
[[ 0.99  0.01]
 [ 0.9   0.1 ]]
[[ 0.99  0.01]
 [ 0.95  0.05]]
[[ 1.  0.  0.  1.]
 [ 0.  1.  0.  1.]]

pgmpy not only allows us to read from the specific file format but also helps us to write the given model into the specific file format. Let's write a sample model into Probmodel XML file.
For that first define our data for the model.

In [7]:

import numpy as np

edges_list = [('VisitToAsia', 'Tuberculosis'),
              ('LungCancer', 'TuberculosisOrCancer'),
              ('Smoker', 'LungCancer'),
              ('Smoker', 'Bronchitis'),
              ('Tuberculosis', 'TuberculosisOrCancer'),
              ('Bronchitis', 'Dyspnea'),
              ('TuberculosisOrCancer', 'Dyspnea'),
              ('TuberculosisOrCancer', 'X-ray')]
nodes = {'Smoker': {'States': {'no': {}, 'yes': {}},
                    'role': 'chance',
                    'type': 'finiteStates',
                    'Coordinates': {'y': '52', 'x': '568'},
                    'AdditionalProperties': {'Title': 'S', 'Relevance': '7.0'}},
         'Bronchitis': {'States': {'no': {}, 'yes': {}},
                        'role': 'chance',
                        'type': 'finiteStates',
                        'Coordinates': {'y': '181', 'x': '698'},
                        'AdditionalProperties': {'Title': 'B', 'Relevance': '7.0'}},
         'VisitToAsia': {'States': {'no': {}, 'yes': {}},
                         'role': 'chance',
                         'type': 'finiteStates',
                         'Coordinates': {'y': '58', 'x': '290'},
                         'AdditionalProperties': {'Title': 'A', 'Relevance': '7.0'}},
         'Tuberculosis': {'States': {'no': {}, 'yes': {}},
                          'role': 'chance',
                          'type': 'finiteStates',
                          'Coordinates': {'y': '150', 'x': '201'},
                          'AdditionalProperties': {'Title': 'T', 'Relevance': '7.0'}},
         'X-ray': {'States': {'no': {}, 'yes': {}},
                   'role': 'chance',
                   'AdditionalProperties': {'Title': 'X', 'Relevance': '7.0'},
                   'Coordinates': {'y': '322', 'x': '252'},
                   'Comment': 'Indica si el test de rayos X ha sido positivo',
                   'type': 'finiteStates'},
         'Dyspnea': {'States': {'no': {}, 'yes': {}},
                     'role': 'chance',
                     'type': 'finiteStates',
                     'Coordinates': {'y': '321', 'x': '533'},
                     'AdditionalProperties': {'Title': 'D', 'Relevance': '7.0'}},
         'TuberculosisOrCancer': {'States': {'no': {}, 'yes': {}},
                                  'role': 'chance',
                                  'type': 'finiteStates',
                                  'Coordinates': {'y': '238', 'x': '336'},
                                  'AdditionalProperties': {'Title': 'E', 'Relevance': '7.0'}},
         'LungCancer': {'States': {'no': {}, 'yes': {}},
                        'role': 'chance',
                        'type': 'finiteStates',
                        'Coordinates': {'y': '152', 'x': '421'},
                        'AdditionalProperties': {'Title': 'L', 'Relevance': '7.0'}}}
edges = {'LungCancer': {'TuberculosisOrCancer': {'directed': 'true'}},
         'Smoker': {'LungCancer': {'directed': 'true'},
                    'Bronchitis': {'directed': 'true'}},
         'Dyspnea': {},
         'X-ray': {},
         'VisitToAsia': {'Tuberculosis': {'directed': 'true'}},
         'TuberculosisOrCancer': {'X-ray': {'directed': 'true'},
                                  'Dyspnea': {'directed': 'true'}},
         'Bronchitis': {'Dyspnea': {'directed': 'true'}},
         'Tuberculosis': {'TuberculosisOrCancer': {'directed': 'true'}}}

cpds = [{'Values': np.array([[0.95, 0.05], [0.02, 0.98]]),
         'Variables': {'X-ray': ['TuberculosisOrCancer']}},
        {'Values': np.array([[0.7, 0.3], [0.4,  0.6]]),
         'Variables': {'Bronchitis': ['Smoker']}},
        {'Values':  np.array([[0.9, 0.1,  0.3,  0.7], [0.2,  0.8,  0.1,  0.9]]),
         'Variables': {'Dyspnea': ['TuberculosisOrCancer', 'Bronchitis']}},
        {'Values': np.array([[0.99], [0.01]]),
         'Variables': {'VisitToAsia': []}},
        {'Values': np.array([[0.5], [0.5]]),
         'Variables': {'Smoker': []}},
        {'Values': np.array([[0.99, 0.01], [0.9, 0.1]]),
         'Variables': {'LungCancer': ['Smoker']}},
        {'Values': np.array([[0.99, 0.01], [0.95, 0.05]]),
         'Variables': {'Tuberculosis': ['VisitToAsia']}},
        {'Values': np.array([[1, 0, 0, 1], [0, 1, 0, 1]]),
         'Variables': {'TuberculosisOrCancer': ['LungCancer', 'Tuberculosis']}}]

Now let's create a model from the given data.

In [8]:

from pgmpy.models import BayesianModel
from pgmpy.factors import TabularCPD

model = BayesianModel(edges_list)

for node in nodes:
    model.node[node] = nodes[node]
for edge in edges:
    model.edge[edge] = edges[edge]

tabular_cpds = []
for cpd in cpds:
    var = list(cpd['Variables'].keys())[0]
    evidence = cpd['Variables'][var]
    values = cpd['Values']
    states = len(nodes[var]['States'])
    evidence_card = [len(nodes[evidence_var]['States'])
                     for evidence_var in evidence]
    tabular_cpds.append(
        TabularCPD(var, states, values, evidence, evidence_card))

model.add_cpds(*tabular_cpds)

In [9]:

from pgmpy.readwrite import ProbModelXMLWriter, get_probmodel_data

To get the data which we need to give to the ProbModelXMLWriter to get the corresponding fileformat we need to use the method get_probmodel_data. This method is only specific to ProbModelXML file, for other file formats we would directly pass the model to the given Writer Class.

In [10]:

model_data = get_probmodel_data(model)
writer = ProbModelXMLWriter(model_data=model_data)
print(writer)

To write the xml data into the file we can use the method write_file of the given Writer class.

In [ ]:

writer.write_file('probmodelxml.pgmx')

General WorkFlow of the readwrite module¶

pgmpy.readwrite.[fileformat]reader is base class for reading the given file format. Replace file fomat with the desired fileforamt from which you want to read the file.In this base class there are different methods defined to parse the given file.For example for XMLBelief Network various methods which are defined are as follows.

In [4]:

from pgmpy.readwrite.XMLBeliefNetwork import XBNReader
reader = XBNReader('xmlbelief.xml')

get_analysisnotebook_values: It returns a dictionary of the attributes of analysisnotebook tag.

In [5]:

reader.get_analysisnotebook_values()

Out[5]:

{'NAME': 'Notebook.Cancer Example From Neapolitan', 'ROOT': 'Cancer'}

get_bnmodel_name: It returns the name of the bnmodel.

In [6]:

reader.get_bnmodel_name()

Out[6]:

'Cancer'

get_static_properties: It returns the dictionary of staticproperties.

In [7]:

reader.get_static_properties()

Out[7]:

{'CREATOR': 'Microsoft Research DTAS',
 'FORMAT': 'MSR DTAS XML',
 'VERSION': '0.2'}

get_variables: It returns the list of variables.

In [8]:

reader.get_variables()

Out[8]:

{'a': {'DESCRIPTION': '(a) Metastatic Cancer',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '13495',
  'YPOS': '10465'},
 'b': {'DESCRIPTION': '(b) Serum Calcium Increase',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '11290',
  'YPOS': '11965'},
 'c': {'DESCRIPTION': '(c) Brain Tumor',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '15250',
  'YPOS': '11935'},
 'd': {'DESCRIPTION': '(d) Coma',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '13960',
  'YPOS': '12985'},
 'e': {'DESCRIPTION': '(e) Papilledema',
  'STATES': ['Present', 'Absent'],
  'TYPE': 'discrete',
  'XPOS': '17305',
  'YPOS': '13240'}}

get_edges: It returs the list of tuples.Each tuple containes two elements (parent, child) for each edge.

In [9]:

reader.get_edges()

Out[9]:

[('a', 'b'), ('a', 'c'), ('b', 'd'), ('c', 'd'), ('c', 'e')]

get_distributions: It returns a dictionary of name and it's distributions.

In [10]:

reader.get_distributions()

Out[10]:

{'a': {'DPIS': array([[ 0.2,  0.8]]), 'TYPE': 'discrete'},
 'b': {'CARDINALITY': array([2]),
  'CONDSET': ['a'],
  'DPIS': array([[ 0.8,  0.2],
         [ 0.2,  0.8]]),
  'TYPE': 'discrete'},
 'c': {'CARDINALITY': array([2]),
  'CONDSET': ['a'],
  'DPIS': array([[ 0.2 ,  0.8 ],
         [ 0.05,  0.95]]),
  'TYPE': 'discrete'},
 'd': {'CARDINALITY': array([2, 2]),
  'CONDSET': ['b', 'c'],
  'DPIS': array([[ 0.8 ,  0.2 ],
         [ 0.9 ,  0.1 ],
         [ 0.7 ,  0.3 ],
         [ 0.05,  0.95]]),
  'TYPE': 'discrete'},
 'e': {'CARDINALITY': array([2]),
  'CONDSET': ['c'],
  'DPIS': array([[ 0.8,  0.2],
         [ 0.6,  0.4]]),
  'TYPE': 'discrete'}}

get_model: It returns an instance of the given model, for ex, BayesianModel in cases of XMLBelief format.

In [11]:

model = reader.get_model()
print(model.nodes())
print(model.edges())

['c', 'b', 'e', 'a', 'd']
[('c', 'e'), ('c', 'd'), ('b', 'd'), ('a', 'c'), ('a', 'b')]

pgmpy.readwrite.[fileformat]writer is base class for writing the model into the given file format.It takes a model as an argument which can be an instance of BayesianModel, MarkovModel. Replace file fomat with the desired fileforamt from which you want to read the file.In this base class there are different methods defined to set the contents of the new file to be created from the given model.For example for XMLBelief Network various methods which are defined are as follows.

In [7]:

from pgmpy.models import BayesianModel
from pgmpy.factors import TabularCPD
import numpy as np
nodes = {'c': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(c) Brain Tumor',
               'YPOS': '11935',
               'XPOS': '15250',
               'TYPE': 'discrete'},
         'a': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(a) Metastatic Cancer',
               'YPOS': '10465',
               'XPOS': '13495',
               'TYPE': 'discrete'},
         'b': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(b) Serum Calcium Increase',
               'YPOS': '11965',
               'XPOS': '11290',
               'TYPE': 'discrete'},
         'e': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(e) Papilledema',
               'YPOS': '13240',
               'XPOS': '17305',
               'TYPE': 'discrete'},
         'd': {'STATES': ['Present', 'Absent'],
               'DESCRIPTION': '(d) Coma',
               'YPOS': '12985',
               'XPOS': '13960',
               'TYPE': 'discrete'}}
model = BayesianModel([('b', 'd'), ('a', 'b'), ('a', 'c'), ('c', 'd'), ('c', 'e')])
cpd_distribution = {'a': {'TYPE': 'discrete', 'DPIS': np.array([[0.2, 0.8]])},
                    'e': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
                                                                [0.6, 0.4]]), 'CONDSET': ['c'], 'CARDINALITY': [2]},
                    'b': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
                                                                [0.2, 0.8]]), 'CONDSET': ['a'], 'CARDINALITY': [2]},
                    'c': {'TYPE': 'discrete', 'DPIS': np.array([[0.2, 0.8],
                                                                [0.05, 0.95]]), 'CONDSET': ['a'], 'CARDINALITY': [2]},
                    'd': {'TYPE': 'discrete', 'DPIS': np.array([[0.8, 0.2],
                                                                [0.9, 0.1],
                                                                [0.7, 0.3],
                                                                [0.05, 0.95]]), 'CONDSET': ['b', 'c'], 'CARDINALITY': [2, 2]}}

tabular_cpds = []
for var, values in cpd_distribution.items():
    evidence = values['CONDSET'] if 'CONDSET' in values else []
    cpd = values['DPIS']
    evidence_card = values['CARDINALITY'] if 'CARDINALITY' in values else []
    states = nodes[var]['STATES']
    cpd = TabularCPD(var, len(states), cpd,
                     evidence=evidence,
                     evidence_card=evidence_card)
    tabular_cpds.append(cpd)
model.add_cpds(*tabular_cpds)

for var, properties in nodes.items():
    model.node[var] = properties

In [8]:

from pgmpy.readwrite.XMLBeliefNetwork import XBNWriter
writer = XBNWriter(model = model)

set_analysisnotebook: It sets the attributes for ANALYSISNOTEBOOK tag.
set_bnmodel_name: It sets the name of the BNMODEL.
set_static_properties: It sets the STAICPROPERTIES tag for the network.
set_variables: It sets the VARIABLES tag for the network.
set_edges: It sets edges/arcs in the network.
set_distributions: It sets distributions in the network.

Wednesday, 12 August 2015

UAI Reader And Wrirter

After mid term, I worked on UAI reader and writer module.Now, it has been successfully merged into the main repository.

UAI Format Brief Description

It uses the simple text file format specified below to describe problem instances

Link to the format : UAI

A file in the UAI format consists of the following two parts, in that order:

Preamble
Function

Preamble: It starts with a text denoting the type of the network.This is followed by a line containing the number of variables. The next line specifies each variable's domain size, one at a time, separated by whitespace.The fourth line contains only one integer, denoting the number of functions in the problem (conditional probability tables for Bayesian networks, general factors for Markov networks). Then, one function per line, the scope of each function is given as follows: The first integer in each line specifies the size of the function's scope, followed by the actual indexes of the variables in the scope. The order of this list is not restricted, except when specifying a conditional probability table (CPT) in a Bayesian network, where the child variable has to come last. Also note that variables are indexed starting with 0.

Example of Preamble

MARKOV
3
2 2 3
2
2 0 1
3 0 1 2

In the above example the model is MARKOV and no of variables are 3, and domain size of the variables are 2 2 3 respectively.

So for reading the preamble, we have used pyparsing module. And to get the no of variables and their domain sizes we have declared method get_variables and get_domain which will return the list of variables and the dictionary with key as variable name and value as their domain size.

For example, for the above preamble the method get_variables will return [var_0, var_1, var_2]

and the method get_domain will return

{var_0: 2, var_1: 2, var_2: 3}

Function: In this section each function is specified by giving its full table (i.e, specifying the function value for each tuple). The order of the functions is identical to the one in which they were introduced in the preamble.

For each function table, first the number of entries is given (this should be equal to the product of the domain sizes of the variables in the scope). Then, one by one, separated by whitespace, the values for each assignment to the variables in the function's scope are enumerated. Tuples are implicitly assumed in ascending order, with the last variable in the scope as the 'least significant'.

Example of Function

2
0.436 0.564

4
0.128 0.872
0.920 0.080

6
0.210 0.333 0.457
0.811 0.000 0.189

Tuesday, 30 June 2015

ProbModelXMl Reader And Writer

I worked on ProbModelXML reader and writer module for this project.My Project involved solving various bugs which were present in the module. It also involved solving the various TODO's to be done. Some of TODO's are
Decision Criteria :

The tag DecisionCriteria is used in multicriteria decision making. as follows:

<AdditionalProperties />0..1

</Criterion>2..n

</DecisionCriteria>

Potential :

The tag DecisionCriteria is used in multicriteria decision making. as follows:

</Variables>
<Values></Values>

</Potential>

My project involved parsing the above type of XML for the reader module.

For writer class my project involved given an instance of Bayesian Model, create a probmodelxml file of that given Bayesian Model.

Sunday, 31 May 2015

GSoC Week1

The first week of coding period is now almost over.

This week I worked on improving the XMLBIF module.The reader class of XMLBIF module was working fine but the writer class was not implemented.
Also the reader class din't have any method which would return the model instance (for ex Bayesian or Markov model instance). Since i was not familiar with the Bayesian and Markov models very much, so my mentors helped me in understanding the Bayesian and Markov models so that i can easily implement them for the next set of modules in the later stage.

Also this week i worked on writing the writer class of the module.Now it has been completed. I have send a PR and hopefully it would be mergeg until next week.

Details about the Writer class

Writer class takes a model_data as input.

An example of sample model_data is

self.model_data =

{'variables': ['light-on', 'bowel-problem', 'dog-out', 'hear-bark', 'family-out'],

'states': {'bowel-problem': ['true', 'false'],
                'dog-out': ['true', 'false'],
                'family-out': ['true', 'false'],
                'hear-bark': ['true', 'false'],
                'light-on': ['true', 'false']},
'property': {'bowel-problem': ['position = (190, 69)'],
                      'dog-out': ['position = (155, 165)'],
                      'family-out': ['position = (112, 69)'],
                      'hear-bark': ['position = (154, 241)'],
                      'light-on': ['position = (73, 165)']},

'parents': {'bowel-problem': [],

                   'dog-out': ['family-out', 'bowel-problem'],
                   'family-out': [],
                   'hear-bark': ['dog-out'],
                   'light-on': ['family-out']},

'cpds': {'bowel-problem': np.array([[0.01],[0.99]]),
              'dog-out': np.array([[0.99, 0.01, 0.97, 0.03],[0.9, 0.1, 0.3, 0.7]]),
              'family-out': np.array([[0.15],[0.85]]),
              'hear-bark': np.array([[0.7, 0.3],[0.01, 0.99]]),
              'light-on': np.array([[0.6, 0.4],[0.05, 0.95]])}}

The writer class has following methods:

add_variables

This method basically adds variables tags to the file.

add_definition

This method add definition tags to the file.

add_cpd

This method adds table tags to the file.

And, finally the file returned by the Writer class is as follows:
<BIF version="0.3">
<NETWORK>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (190, 69)</PROPERTY>
    </VARIABLE>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (155, 165)</PROPERTY>
    </VARIABLE>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (112, 69)</PROPERTY>
    </VARIABLE>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (154, 241)</PROPERTY>
    </VARIABLE>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (73, 165)</PROPERTY>
    </VARIABLE>
    <DEFINITION>
      <FOR>bowel-problem</FOR>
      <TABLE>0.01 0.99 </TABLE>
    </DEFINITION>
    <DEFINITION>
      <FOR>dog-out</FOR>
      <GIVEN>bowel-problem</GIVEN>
      <GIVEN>family-out</GIVEN>
      <TABLE>0.99 0.01 0.97 0.03 0.9 0.1 0.3 0.7 </TABLE>
    </DEFINITION>
    <DEFINITION>
      <FOR>family-out</FOR>
      <TABLE>0.15 0.85 </TABLE>
    </DEFINITION>
    <DEFINITION>
      <FOR>hear-bark</FOR>
      <GIVEN>dog-out</GIVEN>
      <TABLE>0.7 0.3 0.01 0.99 </TABLE>
    </DEFINITION>
    <DEFINITION>
      <FOR>light-on</FOR>
      <GIVEN>family-out</GIVEN>
      <TABLE>0.6 0.4 0.05 0.95 </TABLE>
    </DEFINITION>
</NETWORK>
</BIF>

Sunday, 24 May 2015

Community Bonding Period

Now that the coding period for this year’s Summer of Code is about to start, I am extremely happy that things have been working pretty well with me and my mentors over this community bonding period. We had a group meeting on IRC and all of us are excited to have a more than successful Summer of Code.

Community Bonding Period

In the community bonding period,I reviewed my proposal again and discussed with my mentors about what features are necessary, how things should be implemented and cleared my doubts. I read the documentation, read the code to understand the flow of execution and how things have been implemented.

I read the documentation of pyparsing module which would be used for parsing UAI file format. Here are some of the notes which i created from the documentation so that i can easily find around some functions which would be needed in the later stage.

import pyparsing module as import pyparsing as pp.
p.parseString(s) → input is “s” and parser is “p” .If the syntax of s matches the syntax described by p, this expression will return an object that represents the parts that matched. This object will be an instance of class pp.ParseResults.
pp.Word() class produces a parser that matches a string of letters defined by its first argument
Use pp.Group(phrase) to group things. For example to differentiate models with variable numbers use pp.Group().
Use setResultsName() to give name to the string which is returned for ex model_name = pp.Word(pp.alphas).setResultsName('modelName')

I also made the grammar for the UAI module.

Grammar for UAI Preamble:

Preamble --> model_name \n no_variables

model_name --> MARKOV | BAYES

no_variables --> IntegerNumber \n domain_variables

domain_variables --> IntegerNumber* \n no_functions

no_functions --> IntegerNumber \n function_definition*

function_definition* --> function_definition | function_definition function_definition*

function_definition --> size_function " " IntegerNumber*

Monday, 11 May 2015

GSoC Selection

Got selected for GSoC'15. Feeling awesome. It will be an awesome and challenging summer. It would be a great learning experience. Thanks to all the members for selecting me and keeping confidence in me.

Been busy with exams and travelling back to home…so couldn’t post about it earlier…. :)

My project for GSoC'15 is Parsing from and writing to standard PGM file formats.

Pgmpy is a python library for creation, Manipulation and implementation of Probabilistic graph models.There are various standard file formats for representing PGM data. PGM data basically consists of graph,a table corresponding to each node and a few other attributes of a graph.

Pgmpy needs functionality to read networks from and write networks to these standard file formats.Currently pgmpy supports 4 file formats ProbModelXML, PomDPX, XMLBIF and XMLBeliefNetwork file formats.The project aims to improve the existing implementation of the file formats and implement a UAI file format during the course of GSoC.This way models can be specified in a uniform file format and readily converted to bayesian or markov model objects.

We recently had a meeting with our mentors to discuss the plan ahead.As a part of community bonding period i am reading about the pyparsing module which will be used to parse the UAI file format.

Also i am planning to prepare an abstract grammar for the UAI format which will help me later during the implementation.