Wang Lab | Softwares

Softwares

Sigma-P Method for Rare-Variant Analysis

SigmaP is a rare-variant method for detecting disease associations in case-control sequencing studies. The Sigma-P statistic aggregates the effects of multiple variant sites by computing a weighted sum of the log p-values per site. Each site is weighted by the inverse of its expected standard deviation (denoted by sigma) of the number of variants in controls. The method is robust against signal noise introduced by a large number of neutral variants and is effective for handling variants with opposite effects.

Reference: Cheung YH, Wang G, Leal SM, Wang S (2012) "A Fast and Noise-Resilient Approach to Detect Rare-Variant Associations with Deep Sequencing Data for Complex Disorders" Genetic Epidemiology, 36:675-685

Download: R Scripts and Sample Data

Penalized Conditional/Unconditional Logistic Regression - pclogit R package

pclogit is an R package for penalized conditional/unconditional logistic regression using a network-based peanlty for matched/unmatched case-control data with grouped or graph-constrained variables. The algorithm is efficient for fitting the regularization path and for providing selection probabilities of each predictor for the anaylsis of high-dimensional matched/unmatched case-control data. It uses cyclical coordinate descent in a pathwise fashion.

Reference: Sun H, Wang S (2012) "Penalized Logistic Regression for High-dimensional DNA Methylation Data with Case-Control Studies" Bioinformatics, 28:1368-1375

Reference: Sun H, Wang S (2013) "Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data" Statistics in Medicine, 32:2127–2139

Downloads: Manual, pclogit.tar.gz (for Linux/Unix only)

Rare variants selection - rvsel R package

rvsel is an R package for rare variants selection with sequence data. The most outome-related rare variants are selected within a gene or a genetic region. The selection procedure is based on the power set of the subset of the rare variants.

Reference: Sun H, Wang S (2014) "A Power Set Based Statistical Selection Procedure to Locate Susceptible Rare Variants Associated with Complex Traits with Sequencing Data" Bioinformatics, 30:2317-2323

Downloads: Manual, rvsel_0.1.tar.gz (for Linux/Unix only)

A Network-assisted algorithm for Epigenetic studies - NEpiC R package

We present a network-assisted algorithm, NEpiC, that combines both mean and variance signals in searching for differentially methylated sub-networks using the protein-protein interaction (PPI) network.

Reference: Ruan PF, Shen J, Santella RM, Zhou SG, Wang S (2016) "NEpiC: a Network-assisted algorithm for Epigenetic studies using mean and variance Combined signals" Nucleic Acid Research, 44(16) e134

Download: R Scripts and Sample Data

A penalized Exponential Tilt Model for Epigenetic studies - pETM R package

We present a penalized Exponential Tilt Model (pETM) using network-based regularization that captures both mean and variance signals in DNA methylation data and takes into account the correlations among nearby CpG sites.

Reference: Sun H, Wang Y, Chen Y, Li Y, Wang S (2017) "pETM: a penalized Exponential Tilt Model for analysis of correlated high-dimensional DNA methylation data" Bioinformatics, 33(12): 1765-1772

Download: R package and manual

The MiAge Calculator

To estimate the number of cell divisions (mitotic age) of a given tissue type between individuals is of great interest as that allows their stratification of prospective cancer risk. Here we introduce the MiAge Calculator, a DNA methylation-based mitotic clock calculator based on a novel statistical method MiAge, designed to quantitatively estimate mitotic age of a tissue of an individual. This R code is for the new DMR detection algorithm we proposed that uses mean and variance combined signals.

Reference: Youn A, Wang S (2018) "The MiAge Calculator: a DNA methylation-based mitotic age calculator of Human tissue types" Epigenetics, 13(2):192-206

Download: R Scripts and Sample Data

The Pan-Cancer Analysis

MutIng algorithm integrates somatic mutation and DNA methylation data of multiple cancers and identifies methylation driver genes (MDGs) that, when mutated, have strong associations with specific methylation changes across cancer types.

Reference: Youn A, Kim KI, Rabadan R, Tycko B, Shen YF, Wang S (2018) "A pan-cancer analysis of driver gene mutations, DNA methylation and gene expressions reveals that chromatin remodeling is a major mechanism inducing global changes in cancer epigenomes" BMC Medical Genomics, 11:98

Download: R Scripts and Sample Data

Differentially Methylated Regions (DMRs) detection

Most existing methods developed to identify differentially methylated loci (DML) use mean signals only, and only a few methods were developed to identify DML using both mean and variance signals, while all existing methods to detect differentially methylated regions (DMRs) focus on mean signals only. This R code is for the new DMR detection algorithm we proposed that uses mean and variance combined signals.

Reference: Wang Y, Teschendorff AE, Widschwendter M, Wang S (2019) "Accounting for Differential Variability in detecting Differentially Methylated Regions" Briefings in Bioinformatics, 20(1): 47-57

Download: R Scripts and Sample Data

The Epigenetic-Distance Method

We developed a weighted epigenetic distance-based method characterizing (dis)similarity in methylation measures at multiple CpGs in a gene or a genetic region between pairwise samples, with weights to up-weight signal CpGs and down-weight noise CpGs. Using distance-based approaches, weak signals that might be filtered out in a CpG site-level analysis could be accumulated and therefore boost the overall study power. In constructing epigenetic distances, we considered both differential methylation (DM) and differential variation (DV) signals.

Reference: Wang Y, Qian M, Ruan PF, Teschendorff A, Wang S (2019) "Detection of differentially methylated genes using weighted epigenetic distance-based methods" Nucleic Acid Research, 47(1) e6

Download: R Scripts and Sample Data

ab-SNF: the association-signal-annotation boosted SNF (Similarity Network Fusion) Method

The association-signal-annotation boosted similarity network fusion (ab-SNF) method adds feature-level association signal annotations as weights when constructing pairwise similarity measures between subjects aiming to up-weight signal features and down-weight noise features to improve the performance in disease subtyping.

Reference: Ruan PF, Wang Y, Shen RL, Wang S (2019) "Using Association Signal Annotations to Boost Similarity Network Fusion" Bioinformatics, in press

Download: R Scripts and Sample Data

Dw-main-int: method for Interactions between DNA methylation and environmental factors

We developed a weighted epigenetic distance-based method with a pseudo-data matrix constructed with cross-product terms between DNA methylation and environmental factors that are able to capture their interactions on health outcomes. The distances between pairs of subjects can then be calculated combining the original data matrix with measures of DNA methylations and environmental factors together with the pseudo-data matrix with interactions. Using this approach, we can identify both main and interaction effects.

Reference: Wang Y, Min Q, Tang DL, Herbstman J, Perera F, Wang S (2019) "A powerful and flexible weighted distance-based method incorporating interactions between DNA methylation and environmental factors on health outcomes" Bioinformatics, 36(3):653-659

Download: R Scripts and Sample Data

DiSNEP: a Disease-Specific gene Network Enhancement to improve Prioritizing candidate disease genes

We developed a framework, DiSNEP, that enhances a general human gene network into a network for a specific disease that better reflects true gene interactions for the disease, which subsequently improves network-assisted candidate gene prioritization.

Reference: Ruan PF, Wang S (2020) "DiSNEP: a Disease-Specific gene Network Enhancement to improve Prioritizing candidate disease genes" Briefings in Bioinformatics, accepted

https://github.com/pfruan/DiSNEP"

Wang Lab of Computational Methods @ Biostatistics Department

Softwares

Sigma-P Method for Rare-Variant Analysis

Penalized Conditional/Unconditional Logistic Regression - pclogit R package

Rare variants selection - rvsel R package

A Network-assisted algorithm for Epigenetic studies - NEpiC R package

A penalized Exponential Tilt Model for Epigenetic studies - pETM R package

The MiAge Calculator

The Pan-Cancer Analysis

Differentially Methylated Regions (DMRs) detection

The Epigenetic-Distance Method

ab-SNF: the association-signal-annotation boosted SNF (Similarity Network Fusion) Method

Dw-main-int: method for Interactions between DNA methylation and environmental factors

DiSNEP: a Disease-Specific gene Network Enhancement to improve Prioritizing candidate disease genes