In this notebook, we test for enrichment of overlap between detected mutual exclusivities and function interactions according to STRING.
import sys
sys.path.append("../lib")
import numpy
import sqlite3
import networkx
import nbsupport.io
import nbsupport.stringdb
import nbsupport.util
DATA_DIR = "../data"
The code below downloads STRING data files and creates a database containing the functional interaction network.
import urllib
protein_links_url = "http://string.embl.de/newstring_download/protein.links.detailed.v10/9606.protein.links.detailed.v10.txt.gz"
protein_aliases_url = "http://string.embl.de/newstring_download/protein.aliases.v10/9606.protein.aliases.v10.txt.gz"
filename, respose = urllib.urlretrieve(protein_links_url, "../data/downloads/9606.protein.links.detailed.v10.txt.gz")
filename, respose = urllib.urlretrieve(protein_aliases_url, "../data/downloads/9606.protein.aliases.v10.txt.gz")
!mkdir {DATA_DIR}/string
nbsupport.stringdb.create_string_db(
"../data/string/string.sqlite",
"../data/downloads/9606.protein.links.detailed.v10.txt.gz",
"../data/downloads/9606.protein.aliases.v10.txt.gz")
dataFile = "../data/tcga/tcga-pancan12.h5"
result_mutex = nbsupport.io.load_pairwise_result(dataFile, "/results/mutex")
sigPairs = result_mutex.significant_pairs(0.01)
genes = numpy.char.partition(numpy.asarray(result_mutex.qvalues.index).astype(str), "_")[:, 0]
The mutexes
graph contains nodes for all genes used in the mutual exclusivity analysis, and edges connecting genes if their alterations are mutually exclusive.
mutexes = networkx.Graph([
(pair.gene1.split("_")[0], pair.gene2.split("_")[0])
for i, pair in sigPairs.iterrows()])
mutexes.add_nodes_from(numpy.unique(genes))
The string
graph contains the STRING functional interaction network based on interactions with a combined score greater than 800.
db = sqlite3.connect("../data/string/string.sqlite")
cursor = db.execute(
"""
SELECT n1.protein_name, n2.protein_name
FROM protein_names AS n1,
protein_names AS n2,
protein_links AS ppi
WHERE n1.protein_id = ppi.protein_id_a
AND n2.protein_id = protein_id_b
AND n1.source = "BioMart_HUGO"
AND n2.source = "BioMart_HUGO"
AND ppi.combined_score > 800
""")
string = networkx.Graph()
string.add_edges_from(cursor)
The stringDeg2
graph connects genes if they have a direct functional interaction, or if they share a common interactor.
stringDeg2 = string.copy()
for node in string.nodes_iter():
for neighbour in string.neighbors_iter(node):
for neighbour2 in string.neighbors_iter(neighbour):
stringDeg2.add_edge(node, neighbour2)
We can now determine the overlap of the mutually exclusive gene pairs with the graphs string
and stringDeg2
.
for gene1, gene2 in mutexes.edges_iter():
if string.has_edge(gene1, gene2):
print gene1, gene2
numOverlap = sum(string.has_edge(gene1, gene2) for gene1, gene2 in mutexes.edges_iter())
numOverlapDeg2 = sum(stringDeg2.has_edge(gene1, gene2) for gene1, gene2 in mutexes.edges_iter())
mutexes.number_of_edges(), numOverlap, numOverlapDeg2
def permutedGraph(g):
mapping = dict(zip(g, numpy.random.permutation(g)))
return networkx.relabel_nodes(g, mapping)
nbsupport.util.set_random_seed()
We estimate a null distribution for the observed overlap of mutual exclusivities and functional interactions. To do so, we permute the labels of the mutex
graph, and compute the overlap of this permuted graph.
nullDist = numpy.array([
sum(string.has_edge(gene1, gene2) for gene1, gene2 in permutedGraph(mutexes).edges_iter())
for i in xrange(10000)])
numOverlap, nullDist.mean()
The proportion of permutations with an overlap at least as high as the observed overlap gives us a $P$-value for the enrichment.
numpy.mean(nullDist >= numOverlap)
numpy.mean(numpy.append(nullDist, numOverlap) >= numOverlap)
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(nullDist, 20)
plt.axvline(numOverlap)
We estimate a null distribution in the same way as for the direct interactions, but this time using stringDeg2
to compute the overlap.
nullDist2 = numpy.array([
sum(stringDeg2.has_edge(gene1, gene2) for gene1, gene2 in permutedGraph(mutexes).edges_iter())
for i in xrange(10000)])
numOverlapDeg2, nullDist2.mean()
numpy.mean(numpy.append(nullDist2, numOverlapDeg2) >= numOverlapDeg2)
plt.hist(nullDist2, 20)
plt.axvline(numOverlapDeg2)