DISCOVER is a novel statistical test for detecting co-occurrence and mutual exclusivity in cancer genomics data. Unlike traditional approaches used for these tasks such as Fisher’s exact test, DISCOVER is based on a null model that takes into account the overall tumour-specific alteration rates when deciding whether alterations co-occur more or less often than expected by chance. This improved null model prevents spurious associations in co-occurrence detection, and increases the statistical power to detect mutual exclusivities.
This vignette introduces the DISCOVER R package for mutual exclusivity and co-occurrence analysis. Two variants of the DISCOVER test are implemented by this package. The first variant tests pairs of genes for either co-occurrence or mutual exclusivity. The second is a mutual exclusivity test for gene sets larger than two. We will illustrate the use of both tests by performing a mutual exclusivity and co-occurrence analysis of somatic mutations in the TCGA breast cancer samples.
First, we load the package and the breast cancer mutation data.
A small fragment of the mutation matrix is shown below. It takes the form of a binary matrix, i.e. only 0 and 1 are allowed as values.
## TCGA-A1-A0SB-01A-11D-A142-09 TCGA-A1-A0SD-01A-11D-A10Y-09
## A1CF 0 0
## A2M 0 0
## A2ML1 0 0
## A4GALT 0 0
## A4GNT 0 0
## TCGA-A1-A0SE-01A-11D-A099-09 TCGA-A1-A0SF-01A-11D-A142-09
## A1CF 0 0
## A2M 0 0
## A2ML1 0 0
## A4GALT 0 0
## A4GNT 0 0
## TCGA-A1-A0SG-01A-11D-A142-09
## A1CF 0
## A2M 0
## A2ML1 0
## A4GALT 0
## A4GNT 0
An important ingredient of the DISCOVER test is the estimation of a background matrix. This is how tumour-specific alteration rates can be used by the test. To estimate this background matrix, a discover.matrix
object is created and passed the mutation matrix. Depending on the size of the matrix this might take some time. This estimation step only needs to be performed once.
Note that for estimating this background matrix, it is important that the full mutation matrix is provided. Even if only a subset of genes will subsequently be used in the analysis, a whole-genome view of the mutations is required for this first step. Indeed, for this example we will only look for mutual exclusivity between genes that are mutated in at least 25 tumours.
We can now test for mutual exclusivity between all pairs of genes using the pairwise.discover.test
function. Its only required argument is an instance of the discover.matrix
class like we created above. In order to only analyse frequently mutated genes, we select the corresponding rows before passing the matrix to the function. Again, depending on the size of the matrix, this function might take some time.
We can get a quick overview of the results of this analysis simply by printing the result object.
## Pairwise DISCOVER mutual exclusivity test
## alternative hypothesis: observed overlap is less than expected by chance
## FDR estimation method: discrete Benjamini-Hochberg
##
## number of pairs tested: 3403
## proportion of true null hypotheses: 0.9289582
## number of significant pairs at a maximum FDR of 0.01 : 24
The print
method for the pairwise.discover.out
class optionally takes an argument with which the false discovery rate (FDR) threshold is selected. By default, a maximum FDR of 1% is used. Below, we get a summary of the test results using a somewhat more liberal threshold of 5%.
## Pairwise DISCOVER mutual exclusivity test
## alternative hypothesis: observed overlap is less than expected by chance
## FDR estimation method: discrete Benjamini-Hochberg
##
## number of pairs tested: 3403
## proportion of true null hypotheses: 0.9289582
## number of significant pairs at a maximum FDR of 0.05 : 80
To get the pairs of genes which are significantly mutually exclusive, we can use the as.data.frame
method. This method, too, takes an optional FDR threshold as argument. Below, we use the default of 1%.
## gene1 gene2 p.value q.value
## 5 TP53 CDH1 6.352400e-17 5.111111e-16
## 14 TP53 GATA3 5.470907e-15 3.138212e-14
## 21 TP53 MAP3K1 1.206967e-10 8.509531e-10
## 24 TP53 PIK3CA 1.224563e-10 8.509531e-10
## 12 PIK3CA GATA3 1.149762e-06 2.738010e-05
## 3 GATA3 CDH1 1.013659e-05 2.693347e-04
## 19 NCOR1 MAP3K1 1.527660e-05 3.835712e-04
## 22 TTN MAP3K1 4.145602e-05 1.111640e-03
## 17 TP53 MAP2K4 4.315501e-05 1.111640e-03
## 11 MUC4 GATA3 6.633657e-05 1.711987e-03
## 13 SYNE1 GATA3 9.833334e-05 2.462258e-03
## 9 MAP3K1 GATA3 1.278237e-04 3.089072e-03
## 7 MAP3K1 DMD 1.392087e-04 3.208025e-03
## 1 TP53 ARID1A 1.707227e-04 3.720349e-03
## 15 TTN GATA3 1.943877e-04 4.005469e-03
## 4 MAP3K1 CDH1 1.953800e-04 4.005469e-03
## 23 MUC5B MUC4 2.043568e-04 4.048251e-03
## 2 FAT3 CDH1 2.367769e-04 4.229152e-03
## 16 XIRP2 GATA3 2.429527e-04 4.229152e-03
## 20 NEB MAP3K1 2.431841e-04 4.229152e-03
## 8 PIK3CA FAT3 3.152131e-04 5.572576e-03
## 10 MUC16 GATA3 4.622582e-04 8.737315e-03
## 18 MUC16 MAP3K1 4.783099e-04 8.737315e-03
## 6 MAP3K1 CSMD1 5.207149e-04 9.568223e-03
In the above results, we can identify a set of four genes—TP53, CDH1, GATA3, and MAP3K1—for which all pairwise combinations are found mutually exclusive. We can use the groupwise.discover.test
to asses whether this set of genes is also mutually exclusive as a group. This we can do by simply passing the subset of the discover.matrix
that corresponds to the given genes.
## Groupwise DISCOVER mutual exclusivity test
##
## method: impurity
## P = 7.821035e-22
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] discover_0.9.4
##
## loaded via a namespace (and not attached):
## [1] compiler_4.1.0 magrittr_2.0.1 htmltools_0.5.1.1 tools_4.1.0
## [5] yaml_2.2.1 stringi_1.7.3 rmarkdown_2.9 highr_0.9
## [9] knitr_1.33 stringr_1.4.0 digest_0.6.27 xfun_0.24
## [13] rlang_0.4.11 evaluate_0.14