Monday, 28 May 2012

Week1 (May 21-27)

Hello Everyone,

This is the first blog entry on my GSOC project "Pathway Comparison Plugin" and also my very first blogging experience. I will write up an entry every week reporting about the project status.

The current status: 


1. During the project proposal I worked on a window (JFrame) which would display 2 Pathways outside PathVisio's main window. This initial prototype just loaded two pathways from 2 hard-coded locations inside the JFrame's internal frames, but there has been an issue loading and displaying 2 different pathways in the window simultaneously. The Pathway objects (corresponding to the 2 pathways) would load fine, but the VPathway objects required to draw the pathways inside the 2 internal frames have an issue. I am working on it. At the moment, it seems that PathVisio allows only one VPathway object (recently loaded one) to be used for drawing.

2. I have created a PathVisio plugin which allows the user to choose 2 pathways through two "Load Pathway" buttons and a "Compare" button to compare the Pathways. Hitting the compare button brings up the "Pathway-Comaprison" window mentioned above. This window has 2 internal frames to display the 2 loaded pathways adjacent to each other, but due to the VPathway object issue, it displays only the last loaded Pathway in either one of the internal frames. I also plan to include one more partition (another internal frame) which would display the comparison results (the matching  pathway elements in the 2 pathways)

3. The compare button does the comparison of the two Pathways (compares the 2 'Pathway' objects) after loading and display the two pathways. The comparison includes comparing the DataNodes (<DataNode> tag) and interactions (<Line> tag) in the two pathways.
I have almost finished basic DataNode comparison which compares Datanodes in one Pathway to Datanodes in the other based on the DataNode types (Metabolite, Gene, Protein etc). Thus, Metabolites in one pathway are only compared against Metabolites in the other. This is the first line of comparison. This way we weed out unnecessary comparisons amongst Datanodes which differ in their types.
The second line of comparison involves comparing <Xref> tag under <DataNode> using BridgeDB as the Id mapper. For testing purposes, I am currently comparing GraphId attribute of the DataNodes in the 2 pathways. DataNodes of the same type and matching GraphIds are highlighted in blue.  I will change this to do BridgeDB Xref based comparison soon. I have already tested out how Id mapping works in BridgeDB by checking the mapping of 2 Xref Ids corresponding to a particular gene using a BidgeDb tutorial.

4. I will be discussing "comparing the interactions in the two pathways" with my mentor tonight as there is clarification needed on the subject before I could start writing code.


Below is a screenshot of "Pathway Comparison Plugin" comparing 2 pathways:



As seen , pathway #1 is not loaded in the internal frame due to the VPathway Object issue mentioned earlier above.

The only difference  between the 2 pathways is the DataNode of type "GeneProduct" (the unhighlighted DataNode in the screenshot) which has different graphIds in the 2 pathways. All others have matching graphId and types in both the pathways.

The DataNodes highlighted using blue are similar in both Pathways. The one not highlighted has different graphIds in the 2 pathways.

Monday, 14 May 2012

GSOC 2012 - Pathway Comparison - Project Idea



My ideas for the project
The project's goal is to create a plugin for PathVisio which would be responsible for comparing pathways based on Data nodes and their interactions.
Proposed working model:
The plugin will have options (File selection fields) that would allow users to load two pathway files into PathVisio and a button to compare them. Clicking on "Compare" will popup a Difference Viewer Window (this could probably be the workaround till its possible to load and display two pathways in Pathvisio's window) showing the 2 pathways sitting adjacent to each other. The 2 pathways could be drawn on two separate panels/windows inside the main Difference Viewer window (could use JSplitPane for the main window). There will also be another partition to the main window which would list the differences between the 2 pathways. I borrowed this idea of "including the Difference List* in the difference viewer" from Rianne Fitjen's (fellow GSOC applicant for this project) proposal. Earlier I thought to show the Difference List in the plugin's view itself, but including it in the Difference Viewer's window seems more natural and intuitive from a user's standpoint. Clicking on an item in the Difference List would highlight the respective difference in both the pathways.
*Difference List* : Although I am calling it so, its actually a list of data nodes and interactions that are commonly present in both pathways.
Currently, I am working on a prototype of the project, which as of now draws two pathways in two separate internal-windows (adjacent to each other) contained inside a main window. I could come up with an improved version of this prototype before the GSOC program starts.

Timeline for the project: (April 24 to August 13 ~ 16 weeks)
Week 1,2:
1. Load the two pathways from inside the plugin (drawing pathways is not required for this step) and get reference to the 2 Java Objects: VPathway and Pathway, for each of the 2 pathways. I have already looked into the PathVisio code for this and I should be able to do this in a day or two. 
VPathway Object:  SwingEngine.getEngine().getActiveVPathway() returns this Object, which represents the view (the Graphics) of the loaded pathway. This object could be used to draw pathways on the aforementioned Difference Viewer pop-up and also to highlight certain nodes/lines in the Pathway.
Pathway Object: SwingEngine.getEngine().getActivePathway() returns "Pathway" Object, which represents the GPML parsed Data Model that is used in PathVisio to represent pathway information. This object would be used when we do comparison of the pathways. 
2. Work on comparing the two pathways using the reference to the 2 "Pathway" objects, one from each of the pathways (outcome from step 1). The comparison would identify the DataNodes and interactions (lines connected to Datanodes) that are commonly present in both pathways. Comparing on DataNodes shouldn't be difficult whereas comparing the interactions in the two pathways might take a little extra time i.e it could extend into week 2.
3. A thing which has to be kept in mind (as suggested by Mentor Martina) is establishing the identifier mapping between the 2 pathways' Datanodes. i.e if the same gene is present in both of the pathways with a different ID (i.e one has Entrez Gene identifier while the other uses Ensembl id), then they should be recognized as the same. So we have to use BridgeDb to map the identifiers. Right Now, I am not entirely sure on how this could be done.
Here is what I have in mind: Even before we start the comparison, we should first identify such Datanodes which use different IDs in the two pathways, but actually mean the same . For this, we could run a BridgeDB mapping on genes/metabolites from pathway#1 to genes/metabolites on pathway#2 respectively, and then filter out the genes which are mapped to the same ID. These filtered datanodes will not undergo (i.e simply bypass) the comparison process and instead they will be added directly into the Difference List*.
A little bit of this could spill over to week three, as I would need to learn how to work with BridgeDB.
Week 3,4:
1. Once we have computed the Difference List, we could go ahead and focus on drawing the two loaded pathways onto the Difference Viewer, a window which shows the two pathways next to each other, along with another partition which shows the list of differences.
I have a partial prototype ready, as mentioned above in the proposal. So this part shouldn't be as much difficult as what I had thought earlier. Therefore during this time, I could also work on some additional features that would make the Difference Viewer's UI look better and more accessible to the user.
But these things are only important as long as they could be integrated into the PathVisio's main view, which would eventually be able to display 2 pathways in comparison-mode. So any improvements on the Difference Viewer should be made keeping this in mind.
Hence another option (instead of the option to work on Difference Viewer UI improvements) is to work on PathVisio's core to make it possible to load and display two pathways in PathVisio in comparison mode. I will require a lot of help from the mentor and the developers within the PathVisio community.
Week 5, 6:
1. Work on displaying the Difference List (comparison data) in a viewable-clickable format, such that they are displayed in a row-by-row alignment in the partition inside the Difference viewer. Also keep this flexible enough so that it could be easily shifted into plugin's tab view (JPanel) later on. This should help when PathVisio's main view is ready to display two pathways inside it in comparison mode.
2. Receiving click events from the Difference Viewer's partition that contains the Difference List (data). This means we will be extracting information from the item that was clicked in the Difference List and then propagate this information to the pathways drawn in the Difference Viewer to highlight the respective datanode/interaction in both the pathways.
Week 7,8,9:
Discuss among the PathVisio community about how to proceed with the coding on PathVisio's core source code, so as to provide PathVisio with the capability to draw two pathways inside PathVisio's main view. Currently the software allows loading and viewing of only 1 pathway at a time.
After this is done, the Difference List could then be displayed in the plugin's tab view itself. And the external Difference Viewer window (workaround until this point) will then no longer be necessary, although it can be used as a reference.
I have taken 3 weeks for this, as these changes will affect PathVisio's core. Hence it may require lot of discussion before hand and I have also taken into account the time to work on the side effects to the stable running code that may arise with the introduction of this feature at the core level. 
Week 10 to 14: 
Work on adding other advanced features to the Comparator.
Compensation for exams, other emergencies if any (1.5 weeks). 
Week 15,16:
Two weeks of Testing Time: one in the middle of the program and the other in the end so as to work on bug fixes, code improvements/optimization and Documentation.