コンテンツに飛ぶ | ナビゲーションに飛ぶ

パーソナルツール

現在位置: ホーム / bisulfighter

Bisulfighter: a pipeline for accurate detection of methylated cytosines and differentially methylated regions

Bisulfighter is a software package for detecting methylated cytosines (mCs) and differentially methylated regions (DMRs) from bisulfite sequencing data. Compared with other published tools, Bisulfighter provides greater sensitivity for mCs with fewer false positives, more precise estimates of mC levels, more exact locations of DMRs, and better agreement of DMRs with differentially expressed genes. The superior accuracy is maintained under various sequencing depths and tissue types.

Background

Methylated cytosines (mCs) affect many biological processes such as gene expression, silencing or genomic imprinting. A combination method of bisulfite-treated DNA and high throughput sequencing, known as bisulfite-seq, is widely applied to capture a snapshot of epigenomic state of cells. To find mC regions, bisulfite-converted reads are mapped to a reference genome, and mC levels are estimated with the mapped reads. Although, there are several tools available for mC detection from bisulfite-seq data, we believe there is a room for improvement for better sensitivity and specificity for mC detection. Our major interest is to identify differentially methylated regions (DMRs) from a pair of bisulfite-seq data. There are few tools available for DMR identification. Therefore, we developed Bisulfighter: a pipeline for accurate detection of mC and DMRs.

What is Bisulfighter?

Since the mC calling procedure highly depends on alignment correctness of mapped reads, those computational tasks still have had room for improvement. DMR detection is also an important part of DNA methylation analysis. Currently, there are quite limited number of tools available for DMR detection. Bisulfighter is a pipeline for accurate detection of mCs and DMRs, which provides the best performance among published tools.

Bisulfighter consists of two parts: (a) mC call part and (b) DMR detection part. Overview of Bisulfighter is presented in Fig.1. The mC call part does genome mapping and mC detection. At the genome mapping phase, it first converts all cytosines of bisulfite-converted reads to thymines and maps those reads to a reference genome. Bisulfighter uses LAST (http://last.cbrc.jp/) for read mapping. After restoring converted cytosines of mapped reads, mC ratios are computed by considering quality scores of mapped reads and alignment probabilities which measure the reliability of each aligned column.

The DMR detection part takes mC call results from two samples as its input data then it reports DMRs where significant mC differences are found. Bisulfighter uses hidden Markov model to give statistical annotations to paired input of mC call results.

Fig. 1. Bisulfighter pipeline
(a) mC call part utilizes alignment probability computed with LAST and read quality of target base to
accumulate mC rate.
(b) DMR detection part uses hidden Markov model framework to annotate an input mC level sample pair.

Performance

 

Fig.2. Performance comparison on methylated cytosine calling (a) True positive rate vs false positive count characteristics for different read depths.  (b) True positive rate and false positive rate at read depth 1 million. (c) mC rate estimation error distribution comparison. The lower bar graphs show that bisulfighter achieves leaner error distribution which means bisulfighter estimates mC rate better than other methods.

Fig. 2. Performance comparison on methylated cytosine calling
(a) True positive rate vs false positive count characteristics for different read depths.
(b) True positive rate and false positive rate at read depth 1 million.
(c) mC rate estimation error distribution comparison. The lower bar graphs show that
bisulfighter achieves leaner error distribution which means bisulfighter estimates mC
rate better than other methods.

Fig.3. Comparison on differentially methylated cytosine detection (a) True positive rate vs. DMR length at three different true detection creteria. (b) True postive rate vs. read depth characteristics. (c) Number of differentially expressed genes associated to DMRs identified with different methods.

Fig. 3. Comparison on differentially methylated cytosine detection
(a) True positive rate vs. DMR length at three different true detection criteria.
(b) True positive rate vs. read depth characteristics.
(c) Number of differentially expressed genes associated to DMRs identified with different methods.

Contributors

Yutaka Saito, PhD, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST)
Junko Tsuji, PhD, Graduate School of Frontier Sciences, The University of Tokyo
Toutai Mituyama, PhD, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST)
Information and Mathematical Science and Bioinformaitcs Co., Ltd.
Mitsubishi Space Software, Co. Ltd.

Grants

CREST, Japan Science and Technology Agency (JST)
The New Energy and Development Organization (NEDO)

License

Creative Commons License
Bisulfighter by National Institute of Advanced Industrial Science and Technology (AIST) is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Requirements

bsf-call (Last) requires a large free memory area larger than 30GB for mapping. bsf-call requires Python 2.7 or later. ComMet requires Boost C++ library to compile. There are pre-compiled executables available in Download page.

Source Code

Please visit Github to get the Bisulfighter source code.

Installation

On Mac OS X, you can use Homebrew to install Bisulfighter and required packages. Binary executables for Yosemite and El Capitan are available.

$ brew tap mtoutai/bisulfighter
$ brew install mtoutai/bisulfighter/bisulfighter

External Programs

Package nameDescriptionURL
LAST Required for bsf-call http://last.cbrc.jp

Contact

mituyama-toutai <at> aist.go.jp

Publication

Saito, Y., Tsuji, J., Mituyama, T. Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions (2013), Nucleic Acids Research. doi:10.1093/nar/gkt1373

Simulation Data

mC Detection Simulation

Download simulation data file size=1.3GB

Poor quality score read sets
  • a-1M.fastq/fasta: 1 million
  • a-5M.fastq/fasta: 5 million
Good quality score read sets
  • b-1M.fastq/fasta: 1 million
  • b-5M.fastq/fasta: 5 million
Reference genome
chromosome X: chrX.fa
'seg' directory
Seg files contain alignment information of each read.
87 chrXa 60757 465087 0
Each column is:
  1. alignment length
  2. chromosome name
  3. chromosome start (0-based)
  4. readID (number)
  5. read start (0-based)
* if the start is negative value, the read is mapped as reverse complement.
For more detail: http://www.cbrc.jp/seg-suite/README.html
'fasta' directory
Simulated bisulfite-converted reads before introducing
sequencing errors are stored.  Methylation ratio can be
computed with the fasta files and reference file (chrX),
by comparing bases and coutning the number of Cs and Ts.
Each fasta entry looks like this:
> 0 87 chrXb 106830639 -
Space-separated information is:
> readID read-length chrom chromStart chromStrand
'fastq' directory
Simulated bisulfite-converted reads after introducing sequencing errors are stored. Those reads were used for mapping.
Like fasta entry, space-separated infomation is:
@ readID read-length chrom chromStart chromStrand

ComMet Simulation

Download simulation data file size=23GB

File format:

  • Source read data: 0.0.<read depth>.fastq
  • DMR-simulated read data: 200.<DMR length>.<read depth>.fastq
  • DMR position data: answer/200.<DMR length>.{dmb|dmr}