Thursday, 2 August 2012

Week 9

Summary :
1. finished many-to-many mapping of Datanode Comparison results as mentioned in the last week's report. As you can see now in the screenshot below (in the table "Datanode Comparison Results"), some of the rows in the table have multiple datanodes in column Pathway1 and / or in  column Pathway2

In the results table, Bcl-Xs, BCL-XL in the column 'Pathway1' is mapped to BCL 2L 1 in Pathway2

2. Now there's an option to save the comparison results (as seen in the screenshot above) in a file. As you can see in the screenshot below, the "Save Comparison Results" button in the compare tab allows for saving the comparison results.

Clicking on the "Save Comparison Results" button saves the results in a text file. Right now the file's name and location are hard-coded. Might change this to allow the user to choose his/her preferred location and file name.
I am pasting below the text file's contents in which the results are saved.

<text file start>

Datanode Comparison Reuslts: to the left are Datanode-Labels in Pathway1 and to the right are their matching counterparts in Pathway2
----------------------------
TRADD, TRADD, TRADD <---> TRADD, TRADD
APAF1 <---> APAF1
C-IAP2 <---> BIRC3
p16-INK4 <---> CDKN2A
TNFR-2 <---> TNFRSF1B
AP-1, AP-1 <---> JUN
C-IAP1 <---> BIRC2
DFFB <---> DFFB
Bcl-Xs, BCL-XL <---> BCL2L1
CASP3 <---> CASP3
DR3, DR3 <---> TNFRSF25
Bim <---> BCL2L11
Noxa <---> PMAIP1
CASP9 <---> CASP9
CASP8, CASP8 <---> CASP8
IKB-alpha <---> NFKBIA, NFKBIA
CRADD <---> CRADD
TNFR-1, TNFR-1, TNFR-1 <---> TNFRSF1A, TNFRSF1A
TRAIL-R2 <---> TNFRSF10B
CASP6 <---> CASP6
Smac/DIABLO <---> DIABLO
c-FLIP, c-FLIP <---> CFLAR
DFFA <---> DFFA
BAK1 <---> BAK1
Cyto C <---> CYCS
Puma <---> BBC3
TRAF3 <---> TRAF3
MCL1 <---> MCL1
Nf-kB <---> NFKB1, NFKB1
FADD <---> FADD
BAX <---> BAX
FasL <---> FASLG
IKK <---> IKBKB
BAD <---> BAD, BAD
BCL2 <---> BCL2
BID <---> BID
CASP7 <---> CASP7
BOK <---> BOK
Survivin, survivin <---> BIRC5
FAS, FAS <---> FAS
CASP10 <---> CASP10
TRAIL <---> TNFSF10
Bcl-W <---> BCL2L2
HRK <---> HRK
CASP2 <---> CASP2

Interaction Comparison Reuslts: Datanodes are represented by their text-labels and MLine(<Line-style>) indicates that its a PathwayElement of type MLine
-------------------------------
FAS, MLine(0), FasL <---> FAS, MLine(0), FASLG
CASP8, MLine(0), BID <---> MLine(0), CASP8, BID
MLine(0), DFFB, DFFA <---> DFFB, MLine(0), DFFA

<text File end>

3. <Using Xref based lines in Interaction comaprison>
4. Scoring: <to be updated>

Tuesday, 24 July 2012

Week 8: updated Project plan and other enhancements

This week :

1. Earlier clicking on a result row in the "Interaction Comparison Results" would just highlight the interactions in each pathways (pathway1 and pathway2) without focusing onto the interaction. But now it also focuses onto the interaction and zooms-out if necessary for if the interaction is too big to fit into the view.

2. Highlighting all the interaction matches in "Interaction Comparison Results" is now possible. But zoom-out to fit all the highlighted interactions in the pathway does not happen yet.


3. Working (not yet finished ) on many-to-many mapping of the Datanode matches from pathway1 to pathway2. Earlier this was one-to-one mapping appearing as individual rows in "Datanode Comparison Results" table. 


Let me explain: Consider, Gene A, Gene B are two identical Datanodes in pathway1 (i.e they have equivalent Xrefs). Gene C, Gene D are identical Datanodes in pathway2. And the 2 Datanodes A,B in pathway1 and C,D in pathway2 match.

Earlier with one-to-one mapping, the comparison results looked like
Gene A -> Gene C
Gene A -> Gene D
Gene B -> Gene C
Gene B -> Gene D
And clicking on any of the results highlighted a Datanode in Pathway1 and the corresponding matching Datanode in Pathway2. For instance, clicking on row1 (Gene A -> Gene C) highlights Gene A in pathway1 and Gene C in pathway2.


 But in many-to-many  mapping of the Datanode matches, the four individual results above could simply be represented as one single individual result "Gene A, Gene B -> Gene C, Gene D". Clicking which should highlight datanodes Gene A and Gene B in pathway1 and Gene C, Gene D in pathway2. This is taking time since Interaction Comparison utilizes results from Datanode Comparison. So Interaction comparison results will also have to be modified. 


Also if there are multiple instances of a Datanode with same label i.e Gene A, Gene A in pathway1 (i.e There are two instances of GeneA in pathway1) and Gene B and Gene C in pathway2, then it would be represented as Gene A -> Gene B, Gene C

4. Storing the comparison results (Datanode Comparison and Interaction Comparison). For this, I was supposed to come up with a format (CSV, TSV etc) which would best represent the Comparison Results data to be stored in a file. I think for Interaction comparison results, we could just store the Datanodes' labels and graphIds (not sure if GraphId needs to be stored) for each interaction. Not sure if the  lines in the interaction (lines' GraphIds) are be stored as well. As lines don't have labels, storing  its GraphIds wouldn't provide any intelligence if we look at the file ourselves.


Delimiter format for storing Interaction Comparison results in a file: 
<DN1 Label> <colon separation: between a DN's Label and its GraphId> <DN1 GraphId> <comma> <DN2 Label>  <colon separation>  <DN2 GraphId>  <tab separation: between Interaction in pathway1 and its matching counter-part in pathway2> <DN3 Label> : <DN3 GraphId> , <DN4 Label> : <DN4 GraphId> <DN5 Label> : <DN5 GraphId> <new-line: between each Interaction Comparison result>


For DataNode comparison results, the format could be something similar, but I could come up with a format after many-to-many mapping of DataNode Comparison Results is finished.


Updated Project plan: 


1. Scoring system: Generate a score based on the comparison results which would indicate how similar are the two pathways being compared. Scoring would be based on results of Datanode Comparison or Interaction Comparison. A simple scoring system such as the one in org.pathvisio.core.gpmlDiff.BasicSim.java could be used. 


2. Considering Line Arrow types and their  in interactions : Right now type of the arrows at the line ends are ignored when comparing interactions in the pathways. But this might be considered for MIM line arrows. 


3. Integrating Comparison pop-up window inside PathVisio's main-view: This would probably be done after finishing up 1 and 2 above.

Tuesday, 17 July 2012

Week 7 : Interaction Comparison reworked

Last week, I wrote about Interaction Comparison, but the the part about "finding Interactions" in a Pathway  was slightly wrong as I misunderstood what interaction is.

Below is what I had written last week :
"When I say Interaction, I mean : A group of DataNodes, Lines (these have start and end points i.e <point> tags with graphRefs) & anchors on the lines interacting in such a way that they are all connected , like in a network, where each of the interacting partners are connected to all the others either directly or indirectly. Here, the interaction must comprise of at least 2 datanodes."


But the actual interaction which we are looking for in the pathways is slightly different as my mentor explained it to me: 
For an interaction to exist, there should be a line connecting directly to two Datanodes. We call this line Root-Line. And all the other lines can connect to the root-line either directly or indirectly through anchors. The other lines can have either a Datanode and an Anchor at its ends (or)  have Anchors at its both ends or have Datanodes at its ends. 


Examples of an interaction:


In the earlier version of Interaction comparison, I wasn't aware of the classes MLine, MPoint, MAnchor  and I had written down interaction comparison logic without using  these classes. But in the last meeting with my mentor Martina, she guided me through the classes, and this week I reworked the code and  now its cleaner and shorter than before.  


The algorithm for finding out the interactions (Root-Line and its connected lines and Datanodes) in a Pathway has changed and improved performance-wise: 
<the Algorithm to be updated later today>


The Assumptions in the algorithm have changed from the previous version:
1. Only the first and the last graphRefs of a line are used (i.e the start and end-points of a line) to look for referring  Anchors. 


Screenshot: Comparing 2 pathways (Both DataNode-Comparison results and Interaction-Comparison results are present in the right panel inside their respective tables):


All the matching Datanodes are highlighted initially on hitting the compare button. In the top-right table, there is Datanode comparison results table and in the bottom-right table, there is Interaction comparison results.

Clicking on a result from the Interaction-Comparison results table highlights the corresponding matching interactions in both the pathways. 


Note: 
"Highlight All" button currently isn't programmed. And I am yet to figure out a way to focus the scrollers onto to the highlighted interction in a pathway.

Friday, 6 July 2012

Week 4,5,6 - Interaction Comparison

Sorry for the delay on the blog report. I had exams and a trip to make. So, I was away for 9 days.
And also, Interaction comparison was a bit complex as it first involved finding out all the possible interactions in a pathway and then comparing the interactions in two pathways. Right now, Interaction Comparison is not perfect as it does not take into account the line's connections (i.e which lines connect to which others). i.e As long as DataNodes are the same (i.e Xref same) in the interactions being compared , then the interactions are considered to be matching.

When I say Interaction, I mean : A group of DataNodes, Lines (these have start and end points i.e <point> tags with graphRefs) & anchors on the lines interacting in such a way that they are all connected , like in a network, where each of the interacting partners are connected to all the others either directly or indirectly. Here, the interaction must comprise of at least 2 datanodes.

Example of Interactions

Example 2: There is only one interaction in the example above

Algorithm for finding out the interactions (group of connected datanodes) in a Pathway :
In this algorithm , we loop through the lines in a pathway instead of DataNodes.
This algorithm requires "DataNode-Comparison" results before hand because we will be using only those lines which connect to at least one of the Datanodes from the Datanode-Comparison result (or) those which don't connect to any datanodes. All the other lines wont matter because for any two interactions to match, all the Datanodes present in the interactions must match.


1. Get a list of those lines in a pathway which connect to at least one DataNode from the Datanode comparison result. This list also includes the lines which do not connect to any Datanodes and instead have end-points referring to the anchors positioned on other lines. 


2. We loop through each line in this list (outer 'for' loop, let us call this line : Root Line) and see if other lines present in the list (inner 'for' loop) interact with the root line i.e. see if other lines in the list have something in common with the root line. This something common could be a DataNode or an anchor: where this line connects to an anchor on the other line (or) the other line refers to an anchor present on this line ).


3. This "something common" represents the "interaction partners" present on a line. Whenever the root line and the line from the inner-for-loop have a match in at least one of their interaction partners, they are considered to be connected (forming part of an interaction) and their interaction partners are clubbed. The lines and its datanodes are then part of the interaction.


4. Similarly, the other lines are checked to see if they have an interaction partner which could be present in this clubbed "interaction partners list". If so, the lines are considered to be connected to the root line  (directly or indirectly), and the line and its connecting DataNodes (if any) become part of the interaction.


5. At the end of each loop of  the outer-for-loop, we get a list of the lines and their connecting DataNodes which are either directly or indirectly connected to the root line. In other words, we get an interaction (a list of Datanodes and Lines). Note : Not all the root lines would go on to form an interaction.

6. At the end of the outer-for-loop we get the list of all the Interactions in a pathway. Thus using this approach we find the list of interactions in the 2 pathways. For now, these interactions in the 2 pathways are compared using only the Datanodes present in the interactions, as I am yet to figure out a way where lines' flow/direction is also included in the comparison.

Assumptions in the algorithm:
1. The Graphref attributes in the <point> tag inside a <line> tag, when not referring to DataNodes , are assumed to be referring to Anchors. 
2. Only the first and the last graphRefs of a line are used (i.e the start and end-points of a line) to look for referring Datanodes or Anchors.



Screenshot: Comparing Interactions in a pathway. Pathways in the 2 windows are the same. 



Tuesday, 12 June 2012

Week 3: June 4 -10 & 11 -12

Week 3 Report:

1. Finished Xref comparison of DataNodes using BidgeDB IDMapperStack. This IDMapperStack is made available to the comparison plugin through SwingEngine.getGdbManager().getCurrentGdb();. The IdMapperStack stacks all the available IDMappers and provides a common handle for Id mappping. The comparison now requires the user to first select a biological database (Data->Select Gene/Metabolite Database), so that the IDMapperStack has got atleast one IDMapper to start with.

2. Yet to finish Line comparison.

3. UI layout for "Results Pane" modified to accommodate DataNode-comparison results and Line-comparison results in their own separate JTables. For ow the results under Line-comparison would be the same as DataNode-comparison results as Line-comparison is still in  progress.

4. The pathway loading and comparing now happen inside a separate background thread using the SwingWorker class. Hence a progress dialog would pop-up during the loading period. Followed the example from SwingEngine.openPathway().

5. Fixed bugs related to highlighting PathwayElements, window re-size events, KeyListener events for the "results" Panel's JTables' rows.

6. Documented most of the source code and refactored code. I will be doing this on a regular basis as mentioned in the previous blog post.

7. Need to discuss:
     a. what would be next after finishing Line comparison.
     b. Is "comparison pop-up window" the way forward or showing pathway comparison inside PathVisio's main panel better ?

Please feel free to put down any comments regarding the plugin or its code.


Below are the screenshots of Pathway-comparison plug-in:

Progress Dialog pops up when "Compare" is clicked


Comparing two different pathways and no match is found. Note: Line Comparison is not yet ready


Comparing the same pathway for lack of two similar pathways. Hence all the DataNodes with Xrefs are highlighted in both the pathways.


Highlighting a match ( PathwayElement i.e DataNode found in both of the pathways) by clicking on a row in the  results. Each row corresponds to a match


regards,
Praveen Kumar

Monday, 4 June 2012

Week 2: (May 28 - June 3)

Summary in brief:
introduced Results Pane inside the comparsion pop-up window; learning to use BridgeDB; begun Xref and Line comparison; "load two pathways" issue solved; modified UI layout of the plugin's "Compare" tab; Code Refactor and Documentation


Summary :
This week I started out with creating a panel "Comparison Results" in the comparison pop-up window.
This panel shows the matching entities (similar Datanodes and similar interactions) in the two pathways being compared inside a JTable. Right now, it shows only the matching Datanodes' graphIds as a series of rows under two columns "Pathway1" and "Pathway2".  I will soon change this to show Datanode's label or its xref  instead of GraphId as per the requirement.
The result rows are clickable and clicking on a row highlights the respective matching entities in both the pathways. This is done using the GraphId of the matching entities.

Next, I learnt about the usage of BridgeDB framework, tested out the cases which could be useful in our plugin and got clarifications from the mentors on how I would use BridgeDB IdMappers in our plugin to compare the Datanodes' based on Xrefs. After discussing it with the mentors , it was concluded that optionA below would be better than option B.

option A : Creating an array of DataSources from the Xrefs (since Xref is composed of an Id and DataSource) defined in Pathway#2 and using this DataSource list to translate the Xrefs in Pathway#1 and then compare the translated Xrefs in Pathway#1 with the direct* Xrefs in Pathway#2.
*direct Xref = Xref node found in the pathway (without any mapping/transformation)

option B: Using one DataSource (a common DataSource) and translate the Xrefs in both the pathways to this DataSource and then compare the Xrefs (in Pathway#1 and Pathway#2) based on these translated Xref mapings.

I have started on BridgeDB Xref comparison and I will try to finish it this week. I have also started on line comparison. And I need to figure out a way to compare the lines when there are anchors.

2. With help from Dr. Martijn's, I am now able to load 2 pathways inside the comparison pop-up window. This required custom loading of Pathway and VPathway objects since PathVisio allows for only one usable/active VPathway object at a time.

3. I modified the UI layout for the plugin's "Compare" tab based on  PathVisio's" Search" tab's layout. Here's how it looks now:



4. I also worked on optimizing/refactoring the code so that its robust, clean and well documented from the start. And I think that I should be doing this regularly like setting aside a day or a half in a particular week or doing it soon after a module is finished. This would definitely help in the long run though it could be a little time consuming now.

Monday, 28 May 2012

Week1 (May 21-27)

Hello Everyone,

This is the first blog entry on my GSOC project "Pathway Comparison Plugin" and also my very first blogging experience. I will write up an entry every week reporting about the project status.

The current status: 


1. During the project proposal I worked on a window (JFrame) which would display 2 Pathways outside PathVisio's main window. This initial prototype just loaded two pathways from 2 hard-coded locations inside the JFrame's internal frames, but there has been an issue loading and displaying 2 different pathways in the window simultaneously. The Pathway objects (corresponding to the 2 pathways) would load fine, but the VPathway objects required to draw the pathways inside the 2 internal frames have an issue. I am working on it. At the moment, it seems that PathVisio allows only one VPathway object (recently loaded one) to be used for drawing.

2. I have created a PathVisio plugin which allows the user to choose 2 pathways through two "Load Pathway" buttons and a "Compare" button to compare the Pathways. Hitting the compare button brings up the "Pathway-Comaprison" window mentioned above. This window has 2 internal frames to display the 2 loaded pathways adjacent to each other, but due to the VPathway object issue, it displays only the last loaded Pathway in either one of the internal frames. I also plan to include one more partition (another internal frame) which would display the comparison results (the matching  pathway elements in the 2 pathways)

3. The compare button does the comparison of the two Pathways (compares the 2 'Pathway' objects) after loading and display the two pathways. The comparison includes comparing the DataNodes (<DataNode> tag) and interactions (<Line> tag) in the two pathways.
I have almost finished basic DataNode comparison which compares Datanodes in one Pathway to Datanodes in the other based on the DataNode types (Metabolite, Gene, Protein etc). Thus, Metabolites in one pathway are only compared against Metabolites in the other. This is the first line of comparison. This way we weed out unnecessary comparisons amongst Datanodes which differ in their types.
The second line of comparison involves comparing <Xref> tag under <DataNode> using BridgeDB as the Id mapper. For testing purposes, I am currently comparing GraphId attribute of the DataNodes in the 2 pathways. DataNodes of the same type and matching GraphIds are highlighted in blue.  I will change this to do BridgeDB Xref based comparison soon. I have already tested out how Id mapping works in BridgeDB by checking the mapping of 2 Xref Ids corresponding to a particular gene using a BidgeDb tutorial.

4. I will be discussing "comparing the interactions in the two pathways" with my mentor tonight as there is clarification needed on the subject before I could start writing code.


Below is a screenshot of "Pathway Comparison Plugin" comparing 2 pathways:



As seen , pathway #1 is not loaded in the internal frame due to the VPathway Object issue mentioned earlier above.

The only difference  between the 2 pathways is the DataNode of type "GeneProduct" (the unhighlighted DataNode in the screenshot) which has different graphIds in the 2 pathways. All others have matching graphId and types in both the pathways.

The DataNodes highlighted using blue are similar in both Pathways. The one not highlighted has different graphIds in the 2 pathways.

Monday, 14 May 2012

GSOC 2012 - Pathway Comparison - Project Idea



My ideas for the project
The project's goal is to create a plugin for PathVisio which would be responsible for comparing pathways based on Data nodes and their interactions.
Proposed working model:
The plugin will have options (File selection fields) that would allow users to load two pathway files into PathVisio and a button to compare them. Clicking on "Compare" will popup a Difference Viewer Window (this could probably be the workaround till its possible to load and display two pathways in Pathvisio's window) showing the 2 pathways sitting adjacent to each other. The 2 pathways could be drawn on two separate panels/windows inside the main Difference Viewer window (could use JSplitPane for the main window). There will also be another partition to the main window which would list the differences between the 2 pathways. I borrowed this idea of "including the Difference List* in the difference viewer" from Rianne Fitjen's (fellow GSOC applicant for this project) proposal. Earlier I thought to show the Difference List in the plugin's view itself, but including it in the Difference Viewer's window seems more natural and intuitive from a user's standpoint. Clicking on an item in the Difference List would highlight the respective difference in both the pathways.
*Difference List* : Although I am calling it so, its actually a list of data nodes and interactions that are commonly present in both pathways.
Currently, I am working on a prototype of the project, which as of now draws two pathways in two separate internal-windows (adjacent to each other) contained inside a main window. I could come up with an improved version of this prototype before the GSOC program starts.

Timeline for the project: (April 24 to August 13 ~ 16 weeks)
Week 1,2:
1. Load the two pathways from inside the plugin (drawing pathways is not required for this step) and get reference to the 2 Java Objects: VPathway and Pathway, for each of the 2 pathways. I have already looked into the PathVisio code for this and I should be able to do this in a day or two. 
VPathway Object:  SwingEngine.getEngine().getActiveVPathway() returns this Object, which represents the view (the Graphics) of the loaded pathway. This object could be used to draw pathways on the aforementioned Difference Viewer pop-up and also to highlight certain nodes/lines in the Pathway.
Pathway Object: SwingEngine.getEngine().getActivePathway() returns "Pathway" Object, which represents the GPML parsed Data Model that is used in PathVisio to represent pathway information. This object would be used when we do comparison of the pathways. 
2. Work on comparing the two pathways using the reference to the 2 "Pathway" objects, one from each of the pathways (outcome from step 1). The comparison would identify the DataNodes and interactions (lines connected to Datanodes) that are commonly present in both pathways. Comparing on DataNodes shouldn't be difficult whereas comparing the interactions in the two pathways might take a little extra time i.e it could extend into week 2.
3. A thing which has to be kept in mind (as suggested by Mentor Martina) is establishing the identifier mapping between the 2 pathways' Datanodes. i.e if the same gene is present in both of the pathways with a different ID (i.e one has Entrez Gene identifier while the other uses Ensembl id), then they should be recognized as the same. So we have to use BridgeDb to map the identifiers. Right Now, I am not entirely sure on how this could be done.
Here is what I have in mind: Even before we start the comparison, we should first identify such Datanodes which use different IDs in the two pathways, but actually mean the same . For this, we could run a BridgeDB mapping on genes/metabolites from pathway#1 to genes/metabolites on pathway#2 respectively, and then filter out the genes which are mapped to the same ID. These filtered datanodes will not undergo (i.e simply bypass) the comparison process and instead they will be added directly into the Difference List*.
A little bit of this could spill over to week three, as I would need to learn how to work with BridgeDB.
Week 3,4:
1. Once we have computed the Difference List, we could go ahead and focus on drawing the two loaded pathways onto the Difference Viewer, a window which shows the two pathways next to each other, along with another partition which shows the list of differences.
I have a partial prototype ready, as mentioned above in the proposal. So this part shouldn't be as much difficult as what I had thought earlier. Therefore during this time, I could also work on some additional features that would make the Difference Viewer's UI look better and more accessible to the user.
But these things are only important as long as they could be integrated into the PathVisio's main view, which would eventually be able to display 2 pathways in comparison-mode. So any improvements on the Difference Viewer should be made keeping this in mind.
Hence another option (instead of the option to work on Difference Viewer UI improvements) is to work on PathVisio's core to make it possible to load and display two pathways in PathVisio in comparison mode. I will require a lot of help from the mentor and the developers within the PathVisio community.
Week 5, 6:
1. Work on displaying the Difference List (comparison data) in a viewable-clickable format, such that they are displayed in a row-by-row alignment in the partition inside the Difference viewer. Also keep this flexible enough so that it could be easily shifted into plugin's tab view (JPanel) later on. This should help when PathVisio's main view is ready to display two pathways inside it in comparison mode.
2. Receiving click events from the Difference Viewer's partition that contains the Difference List (data). This means we will be extracting information from the item that was clicked in the Difference List and then propagate this information to the pathways drawn in the Difference Viewer to highlight the respective datanode/interaction in both the pathways.
Week 7,8,9:
Discuss among the PathVisio community about how to proceed with the coding on PathVisio's core source code, so as to provide PathVisio with the capability to draw two pathways inside PathVisio's main view. Currently the software allows loading and viewing of only 1 pathway at a time.
After this is done, the Difference List could then be displayed in the plugin's tab view itself. And the external Difference Viewer window (workaround until this point) will then no longer be necessary, although it can be used as a reference.
I have taken 3 weeks for this, as these changes will affect PathVisio's core. Hence it may require lot of discussion before hand and I have also taken into account the time to work on the side effects to the stable running code that may arise with the introduction of this feature at the core level. 
Week 10 to 14: 
Work on adding other advanced features to the Comparator.
Compensation for exams, other emergencies if any (1.5 weeks). 
Week 15,16:
Two weeks of Testing Time: one in the middle of the program and the other in the end so as to work on bug fixes, code improvements/optimization and Documentation.