Tuesday, 24 July 2012

Week 8: updated Project plan and other enhancements

This week :

1. Earlier clicking on a result row in the "Interaction Comparison Results" would just highlight the interactions in each pathways (pathway1 and pathway2) without focusing onto the interaction. But now it also focuses onto the interaction and zooms-out if necessary for if the interaction is too big to fit into the view.

2. Highlighting all the interaction matches in "Interaction Comparison Results" is now possible. But zoom-out to fit all the highlighted interactions in the pathway does not happen yet.


3. Working (not yet finished ) on many-to-many mapping of the Datanode matches from pathway1 to pathway2. Earlier this was one-to-one mapping appearing as individual rows in "Datanode Comparison Results" table. 


Let me explain: Consider, Gene A, Gene B are two identical Datanodes in pathway1 (i.e they have equivalent Xrefs). Gene C, Gene D are identical Datanodes in pathway2. And the 2 Datanodes A,B in pathway1 and C,D in pathway2 match.

Earlier with one-to-one mapping, the comparison results looked like
Gene A -> Gene C
Gene A -> Gene D
Gene B -> Gene C
Gene B -> Gene D
And clicking on any of the results highlighted a Datanode in Pathway1 and the corresponding matching Datanode in Pathway2. For instance, clicking on row1 (Gene A -> Gene C) highlights Gene A in pathway1 and Gene C in pathway2.


 But in many-to-many  mapping of the Datanode matches, the four individual results above could simply be represented as one single individual result "Gene A, Gene B -> Gene C, Gene D". Clicking which should highlight datanodes Gene A and Gene B in pathway1 and Gene C, Gene D in pathway2. This is taking time since Interaction Comparison utilizes results from Datanode Comparison. So Interaction comparison results will also have to be modified. 


Also if there are multiple instances of a Datanode with same label i.e Gene A, Gene A in pathway1 (i.e There are two instances of GeneA in pathway1) and Gene B and Gene C in pathway2, then it would be represented as Gene A -> Gene B, Gene C

4. Storing the comparison results (Datanode Comparison and Interaction Comparison). For this, I was supposed to come up with a format (CSV, TSV etc) which would best represent the Comparison Results data to be stored in a file. I think for Interaction comparison results, we could just store the Datanodes' labels and graphIds (not sure if GraphId needs to be stored) for each interaction. Not sure if the  lines in the interaction (lines' GraphIds) are be stored as well. As lines don't have labels, storing  its GraphIds wouldn't provide any intelligence if we look at the file ourselves.


Delimiter format for storing Interaction Comparison results in a file: 
<DN1 Label> <colon separation: between a DN's Label and its GraphId> <DN1 GraphId> <comma> <DN2 Label>  <colon separation>  <DN2 GraphId>  <tab separation: between Interaction in pathway1 and its matching counter-part in pathway2> <DN3 Label> : <DN3 GraphId> , <DN4 Label> : <DN4 GraphId> <DN5 Label> : <DN5 GraphId> <new-line: between each Interaction Comparison result>


For DataNode comparison results, the format could be something similar, but I could come up with a format after many-to-many mapping of DataNode Comparison Results is finished.


Updated Project plan: 


1. Scoring system: Generate a score based on the comparison results which would indicate how similar are the two pathways being compared. Scoring would be based on results of Datanode Comparison or Interaction Comparison. A simple scoring system such as the one in org.pathvisio.core.gpmlDiff.BasicSim.java could be used. 


2. Considering Line Arrow types and their  in interactions : Right now type of the arrows at the line ends are ignored when comparing interactions in the pathways. But this might be considered for MIM line arrows. 


3. Integrating Comparison pop-up window inside PathVisio's main-view: This would probably be done after finishing up 1 and 2 above.

Tuesday, 17 July 2012

Week 7 : Interaction Comparison reworked

Last week, I wrote about Interaction Comparison, but the the part about "finding Interactions" in a Pathway  was slightly wrong as I misunderstood what interaction is.

Below is what I had written last week :
"When I say Interaction, I mean : A group of DataNodes, Lines (these have start and end points i.e <point> tags with graphRefs) & anchors on the lines interacting in such a way that they are all connected , like in a network, where each of the interacting partners are connected to all the others either directly or indirectly. Here, the interaction must comprise of at least 2 datanodes."


But the actual interaction which we are looking for in the pathways is slightly different as my mentor explained it to me: 
For an interaction to exist, there should be a line connecting directly to two Datanodes. We call this line Root-Line. And all the other lines can connect to the root-line either directly or indirectly through anchors. The other lines can have either a Datanode and an Anchor at its ends (or)  have Anchors at its both ends or have Datanodes at its ends. 


Examples of an interaction:


In the earlier version of Interaction comparison, I wasn't aware of the classes MLine, MPoint, MAnchor  and I had written down interaction comparison logic without using  these classes. But in the last meeting with my mentor Martina, she guided me through the classes, and this week I reworked the code and  now its cleaner and shorter than before.  


The algorithm for finding out the interactions (Root-Line and its connected lines and Datanodes) in a Pathway has changed and improved performance-wise: 
<the Algorithm to be updated later today>


The Assumptions in the algorithm have changed from the previous version:
1. Only the first and the last graphRefs of a line are used (i.e the start and end-points of a line) to look for referring  Anchors. 


Screenshot: Comparing 2 pathways (Both DataNode-Comparison results and Interaction-Comparison results are present in the right panel inside their respective tables):


All the matching Datanodes are highlighted initially on hitting the compare button. In the top-right table, there is Datanode comparison results table and in the bottom-right table, there is Interaction comparison results.

Clicking on a result from the Interaction-Comparison results table highlights the corresponding matching interactions in both the pathways. 


Note: 
"Highlight All" button currently isn't programmed. And I am yet to figure out a way to focus the scrollers onto to the highlighted interction in a pathway.

Friday, 6 July 2012

Week 4,5,6 - Interaction Comparison

Sorry for the delay on the blog report. I had exams and a trip to make. So, I was away for 9 days.
And also, Interaction comparison was a bit complex as it first involved finding out all the possible interactions in a pathway and then comparing the interactions in two pathways. Right now, Interaction Comparison is not perfect as it does not take into account the line's connections (i.e which lines connect to which others). i.e As long as DataNodes are the same (i.e Xref same) in the interactions being compared , then the interactions are considered to be matching.

When I say Interaction, I mean : A group of DataNodes, Lines (these have start and end points i.e <point> tags with graphRefs) & anchors on the lines interacting in such a way that they are all connected , like in a network, where each of the interacting partners are connected to all the others either directly or indirectly. Here, the interaction must comprise of at least 2 datanodes.

Example of Interactions

Example 2: There is only one interaction in the example above

Algorithm for finding out the interactions (group of connected datanodes) in a Pathway :
In this algorithm , we loop through the lines in a pathway instead of DataNodes.
This algorithm requires "DataNode-Comparison" results before hand because we will be using only those lines which connect to at least one of the Datanodes from the Datanode-Comparison result (or) those which don't connect to any datanodes. All the other lines wont matter because for any two interactions to match, all the Datanodes present in the interactions must match.


1. Get a list of those lines in a pathway which connect to at least one DataNode from the Datanode comparison result. This list also includes the lines which do not connect to any Datanodes and instead have end-points referring to the anchors positioned on other lines. 


2. We loop through each line in this list (outer 'for' loop, let us call this line : Root Line) and see if other lines present in the list (inner 'for' loop) interact with the root line i.e. see if other lines in the list have something in common with the root line. This something common could be a DataNode or an anchor: where this line connects to an anchor on the other line (or) the other line refers to an anchor present on this line ).


3. This "something common" represents the "interaction partners" present on a line. Whenever the root line and the line from the inner-for-loop have a match in at least one of their interaction partners, they are considered to be connected (forming part of an interaction) and their interaction partners are clubbed. The lines and its datanodes are then part of the interaction.


4. Similarly, the other lines are checked to see if they have an interaction partner which could be present in this clubbed "interaction partners list". If so, the lines are considered to be connected to the root line  (directly or indirectly), and the line and its connecting DataNodes (if any) become part of the interaction.


5. At the end of each loop of  the outer-for-loop, we get a list of the lines and their connecting DataNodes which are either directly or indirectly connected to the root line. In other words, we get an interaction (a list of Datanodes and Lines). Note : Not all the root lines would go on to form an interaction.

6. At the end of the outer-for-loop we get the list of all the Interactions in a pathway. Thus using this approach we find the list of interactions in the 2 pathways. For now, these interactions in the 2 pathways are compared using only the Datanodes present in the interactions, as I am yet to figure out a way where lines' flow/direction is also included in the comparison.

Assumptions in the algorithm:
1. The Graphref attributes in the <point> tag inside a <line> tag, when not referring to DataNodes , are assumed to be referring to Anchors. 
2. Only the first and the last graphRefs of a line are used (i.e the start and end-points of a line) to look for referring Datanodes or Anchors.



Screenshot: Comparing Interactions in a pathway. Pathways in the 2 windows are the same.