Sunday 31 May 2015

GSoC Week1

The first week of coding period is now almost over.

This week I worked on improving the XMLBIF module.The reader class of XMLBIF module was working fine but the writer class was not implemented.
Also the reader class din't have any method which would return the model instance (for ex Bayesian or Markov model instance). Since i was not familiar with the Bayesian and Markov models very much, so my mentors helped me in understanding the Bayesian and Markov models so that i can easily implement them for the next set of modules in the later stage.

Also this week i worked on writing the writer class of the module.Now it has been completed. I have send a PR and hopefully it would be mergeg until next week.


Details about the Writer class

Writer class takes a model_data as input.

An example of sample model_data is

 self.model_data = 
{'variables': ['light-on', 'bowel-problem', 'dog-out', 'hear-bark', 'family-out'],
  'states': {'bowel-problem': ['true', 'false'],
                'dog-out': ['true', 'false'],
                'family-out': ['true', 'false'],
                'hear-bark': ['true', 'false'],
                'light-on': ['true', 'false']},
  'property': {'bowel-problem': ['position = (190, 69)'],
                      'dog-out': ['position = (155, 165)'],
                      'family-out': ['position = (112, 69)'],
                      'hear-bark': ['position = (154, 241)'],
                      'light-on': ['position = (73, 165)']},
'parents': {'bowel-problem': [],
                   'dog-out': ['family-out', 'bowel-problem'],
                   'family-out': [],
                   'hear-bark': ['dog-out'],
                   'light-on': ['family-out']},
'cpds': {'bowel-problem': np.array([[0.01],[0.99]]),
              'dog-out': np.array([[0.99, 0.01, 0.97, 0.03],[0.9, 0.1, 0.3, 0.7]]),
              'family-out': np.array([[0.15],[0.85]]),
              'hear-bark': np.array([[0.7, 0.3],[0.01, 0.99]]),
              'light-on': np.array([[0.6, 0.4],[0.05, 0.95]])}}
 
 
The writer class has following methods:
  1.  add_variables
    1. This method basically adds variables tags to the file.
  2.  add_definition
    1. This method add definition tags to the file.
  3.  add_cpd
    1. This method adds table tags to the file.
And, finally the file returned by the Writer class is as follows:
<BIF version="0.3">
  <NETWORK>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (190, 69)</PROPERTY>
    </VARIABLE>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (155, 165)</PROPERTY>
    </VARIABLE>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (112, 69)</PROPERTY>
    </VARIABLE>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (154, 241)</PROPERTY>
    </VARIABLE>
    <VARIABLE TYPE="nature">
      <OUTCOME>true</OUTCOME>
      <OUTCOME>false</OUTCOME>
      <PROPERTY>position = (73, 165)</PROPERTY>
    </VARIABLE>
    <DEFINITION>
      <FOR>bowel-problem</FOR>
      <TABLE>0.01 0.99 </TABLE>
    </DEFINITION>
    <DEFINITION>
      <FOR>dog-out</FOR>
      <GIVEN>bowel-problem</GIVEN>
      <GIVEN>family-out</GIVEN>
      <TABLE>0.99 0.01 0.97 0.03 0.9 0.1 0.3 0.7 </TABLE>
    </DEFINITION>
    <DEFINITION>
      <FOR>family-out</FOR>
      <TABLE>0.15 0.85 </TABLE>
    </DEFINITION>
    <DEFINITION>
      <FOR>hear-bark</FOR>
      <GIVEN>dog-out</GIVEN>
      <TABLE>0.7 0.3 0.01 0.99 </TABLE>
    </DEFINITION>
    <DEFINITION>
      <FOR>light-on</FOR>
      <GIVEN>family-out</GIVEN>
      <TABLE>0.6 0.4 0.05 0.95 </TABLE>
    </DEFINITION>
  </NETWORK>
</BIF>

Sunday 24 May 2015

Community Bonding Period

Now that the coding period for this year’s Summer of Code is about to start, I am extremely happy that things have been working pretty well with me and my mentors over this community bonding period. We had a group meeting on IRC and all of us are excited to have  a more than successful Summer of Code.


Community Bonding Period

In the community bonding period,I reviewed my proposal again and discussed with my mentors about what features are necessary, how things should be implemented and cleared my doubts. I read the documentation, read the code  to understand the flow of execution and how things have been implemented.

I read the documentation of pyparsing module which would  be used for parsing UAI file format. Here are some of the notes which i created from the documentation so that i can easily find around some functions which would be needed in the later stage.
  1. import pyparsing module as import pyparsing as pp.
  2. p.parseString(s) → input is “s” and parser is “p” .If the syntax of s matches the syntax described by p, this expression will return an object that represents the parts that matched. This object will be an instance of class pp.ParseResults.
  3. pp.Word() class produces a parser that matches a string of letters defined by its first argument
  4. Use pp.Group(phrase) to group things. For example to differentiate models with variable numbers use pp.Group().
  5. Use setResultsName() to give name to the string which is returned for ex model_name = pp.Word(pp.alphas).setResultsName('modelName')

I also made the grammar for the UAI module.

Grammar for UAI Preamble:
Preamble --> model_name \n no_variables
model_name --> MARKOV | BAYES
no_variables --> IntegerNumber \n domain_variables
domain_variables --> IntegerNumber* \n no_functions
no_functions --> IntegerNumber \n function_definition*
function_definition* --> function_definition | function_definition function_definition*
function_definition --> size_function " " IntegerNumber*


Monday 11 May 2015

GSoC Selection




Got selected for GSoC'15. Feeling awesome. It will be an awesome and challenging summer. It would be a great learning experience. Thanks to all the members for selecting me and keeping confidence in me.

Been busy with exams and travelling back to home…so couldn’t post about it earlier…. :)

My project for GSoC'15 is Parsing from and writing to standard PGM file formats.
 
Pgmpy is a python library for creation, Manipulation and implementation of Probabilistic graph models.There are various standard file formats for representing PGM data. PGM data basically consists of graph,a table corresponding to each node and a few other attributes of a graph.
Pgmpy needs functionality to read networks from and write networks to these standard file formats.Currently pgmpy supports 4 file formats ProbModelXML, PomDPX, XMLBIF and XMLBeliefNetwork file formats.The project aims to improve the existing implementation of the file formats and implement a UAI file format during the course of GSoC.This way models can be specified in a uniform file format and readily converted to bayesian or markov model objects.

We recently had a meeting with our mentors to discuss the plan ahead.As a part of community bonding period i am reading about the pyparsing module which will be used to parse the UAI file format.

Also i am planning to prepare an abstract grammar for the UAI format which will help me later during the implementation.