Bismark bug in BS-seq alignment
1
2
Entering edit mode
8.4 years ago
Shicheng Guo ★ 9.4k

Did you find such bug in the bismark. In the report processing, bismark said some reads were aligned exactly 1 times, however, in the eventual bam file, there is no any alignment?

bismark --non_directional ~/db/aligndb/hg19/bismark/  illumina_adapter_primer.fastq

The output reports are as the following:

Now running 4 individual instances of Bowtie 2 against the bisulfite genome of /home/shg047/db/aligndb/hg19/bismark/ with the specified options: -q --score-min L,0,-0.2 --ignore-quals

Now starting the Bowtie 2 aligner for CTreadCTgenome (reading in sequences from illumina_adapter_primer.fastq_C_to_T.fastq with options -q --score-min L,0,-0.2 --ignore-quals --norc)
Using Bowtie 2 index: /home/shg047/db/aligndb/hg19/bismark/Bisulfite_Genome/CT_conversion/BS_CT

Found first alignment:  Illumina_Universal_Adapter      0       chrX_CT_converted       72413190        1       12M     *       0       0       AGATTGGAAGAG    FFFFFFFFFFFFF   AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:12      YT:Z:UU
Now starting the Bowtie 2 aligner for CTreadGAgenome (reading in sequences from illumina_adapter_primer.fastq_C_to_T.fastq with options -q --score-min L,0,-0.2 --ignore-quals --nofw)
Using Bowtie 2 index: /home/shg047/db/aligndb/hg19/bismark/Bisulfite_Genome/GA_conversion/BS_GA

155 reads; of these:
  155 (100.00%) were unpaired; of these:
    141 (90.97%) aligned 0 times
    1 (0.65%) aligned exactly 1 time
    13 (8.39%) aligned >1 times
9.03% overall alignment rate
Found first alignment:  Illumina_Universal_Adapter      16      chr2_GA_converted       67922533        1       12M     *       0       0       CTCTTCCAATCT    FFFFFFFFFFFFF   AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:12      YT:Z:UU
Now starting the Bowtie 2 aligner for GAreadCTgenome (reading in sequences from illumina_adapter_primer.fastq_G_to_A.fastq with options -q --score-min L,0,-0.2 --ignore-quals --nofw)
Using Bowtie 2 index: /home/shg047/db/aligndb/hg19/bismark/Bisulfite_Genome/CT_conversion/BS_CT

155 reads; of these:
  155 (100.00%) were unpaired; of these:
    143 (92.26%) aligned 0 times
    1 (0.65%) aligned exactly 1 time
    11 (7.10%) aligned >1 times
7.74% overall alignment rate
Found first alignment:  Illumina_Universal_Adapter      16      chr11_CT_converted      115500938       1       12M     *       0       0       TTTTTTTGATTT    FFFFFFFFFFFFF   AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:12      YT:Z:UU
Now starting the Bowtie 2 aligner for GAreadGAgenome (reading in sequences from illumina_adapter_primer.fastq_G_to_A.fastq with options -q --score-min L,0,-0.2 --ignore-quals --norc)
Using Bowtie 2 index: /home/shg047/db/aligndb/hg19/bismark/Bisulfite_Genome/GA_conversion/BS_GA

155 reads; of these:
  155 (100.00%) were unpaired; of these:
    131 (84.52%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    24 (15.48%) aligned >1 times
15.48% overall alignment rate
Found first alignment:  Illumina_Universal_Adapter      0       chrX_GA_converted       118671939       1       12M     *       0       0       AAATCAAAAAAA    FFFFFFFFFFFFF   AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:12      YT:Z:UU

>>> Writing bisulfite mapping results to illumina_adapter_primer.fastq_bismark_bt2.bam <<<

155 reads; of these:
  155 (100.00%) were unpaired; of these:
    133 (85.81%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    22 (14.19%) aligned >1 times
14.19% overall alignment rate
Current working directory is: /home/shg047/software/FastQC/Configuration/test

However, in the last step:

Reading in the sequence file illumina_adapter_primer.fastq
Processed 155 sequences in total

Successfully deleted the temporary files illumina_adapter_primer.fastq_C_to_T.fastq and illumina_adapter_primer.fastq_G_to_A.fastq

Final Alignment report
======================
Sequences analysed in total:    155
Number of alignments with a unique best hit from the different alignments:      0
Mapping efficiency:     0.0%

Sequences with no alignments under any condition:       126
Sequences did not map uniquely: 29
Sequences which were discarded because genomic sequence could not be extracted: 0
bismark BS-seq • 3.2k views
ADD COMMENT
3
Entering edit mode
8.4 years ago

That's the output of bowtie2, which is being run 4 times by bismark. That part of the output should just be ignored since the metrics aren't exactly meaningful.

Edit: It's probably useful to expand on this, since I'm sure others will have this exact question and see this post via google. So how do bismark (and similar aligners, like bison) work? They in silico convert fastq files and then align them against in silico converted genomes. Due to how BS-seq works, you end up needing to run either 2 or 4 alignments of each read (there are ways around this, but for convenience this is often how things are done), which results in this case in using 4 instances of bowtie2 at once. Each time bowtie2 finishes a file/pair of files it prints out some information regarding how many reads aligned and whether they did so 0, 1 or >1 times. But of course, since in this case there are 4 instances of bowtie2 being run at once, a read can align exactly 1 time in each of these instances...meaning that it's actually a multimapper. Bismark and other tools then read all of the reported alignments at once and process them to assign a read accordingly. Consequently, the metrics output by a single run of bowtie2 and those output by bismark will typically differ substantially (particularly for non-directional libraries).

ADD COMMENT

Login before adding your answer.

Traffic: 1849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6