Question

BISMARK not producing one mapping result and one report file per paired-end input file pair

0

Entering edit mode

5.3 years ago

bio_d ▴ 20

Hi,

I am trying the BISMARK software for Bisulfite sequence analysis but in the second step when I use

REF_DIR=/home/DNA_METHYLATION/REFERENCE
bismark --basename DM1 --output_dir OUTPUT_DM1 --samtools_path /home/samtools-1.8 --path_to_bowtie /home/software/bowtie-master --bowtie1 $REF_DIR -1 DM1QualityTrimmed.A_1.fq.gz,DM1QualityTrimmed.B_1.fq.gz,DM1QualityTrimmed.C_1.fq.gz,DM1QualityTrimmed.D_1.fq.gz,DM1QualityTrimmed.E_1.fq.gz -2 DM1QualityTrimmed.A_2.fq.gz,DM1QualityTrimmed.B_2.fq.gz,DM1QualityTrimmed.C_2.fq.gz,DM1QualityTrimmed.D_2.fq.gz,DM1QualityTrimmed.E_2.fq.gz

I only get output files corresponding to the last pair. A bam file named "DM1_pe.bam" and the following report in the file "DM1_PE_report.txt"

Bismark report for: DM1QualityTrimmed.E_1.fq.gz and DM1QualityTrimmed.E_2.fq.gz (version: v0.20.0)
Bismark was run with Bowtie against the bisulfite genome of /home/DNA_METHYLATION/REFERENCE/ with the specified options: -q -n 1 -k 2 --best --maxins 500 --chunkmbs 512
Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed)

Final Alignment report
======================
Sequence pairs analysed in total:   52869990
Number of paired-end alignments with a unique best hit: 26447546
Mapping efficiency: 50.0% 
Sequence pairs with no alignments under any condition:  23883829
Sequence pairs did not map uniquely:    2538615
Sequence pairs which were discarded because genomic sequence could not be extracted:    42243

Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT:   13260580    ((converted) top strand)
GA/CT/CT:   0   (complementary to (converted) top strand)
GA/CT/GA:   0   (complementary to (converted) bottom strand)
CT/GA/GA:   13186966    ((converted) bottom strand)

Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

Final Cytosine Methylation Report
=================================
Total number of C's analysed:   1410585296

Total methylated C's in CpG context:    58687841
Total methylated C's in CHG context:    590155
Total methylated C's in CHH context:    1661886

Total unmethylated C's in CpG context:  16608516
Total unmethylated C's in CHG context:  327902201
Total unmethylated C's in CHH context:  1005134697

C methylated in CpG context:    77.9%
C methylated in CHG context:    0.2%
C methylated in CHH context:    0.2%


Bismark completed in 0d 16h 52m 0s

Could anyone please tell why I do not get 5 bam files and 5 txt(report) files corresponding to the 5 pairs of input files?

Thanks in advance.

BISMARK • 2.5k views

ADD COMMENT • link updated 5.3 years ago by groverj3 ▴ 20 • written 5.3 years ago by bio_d ▴ 20

score 1 · Answer 1 · 2019-01-17

I'm a little late here, since this post was 28 days ago. However, I would assume it's because you're running Bismark with all those input files at once, rather than separately on each pair. This makes sense because the log you posted indicates that it's being run with 500M input reads.

When I batch a lot of Bismark runs I always stick it in a bash for loop and iterate through each pair of fastq files.

ex.

for fqgz_file in ./*val_1.fq.gz
do
    sample=$(echo ${fqgz_file} | sed "s/R1_val_1\.fq\.gz//")
    bismark --un --ambiguous --gzip \
    -p 4 \
    --multicore 2 \
    --output_dir "./bismark_lambda" \
    --genome_folder "/home/groverj3/large_data/lambda_genome" \
    -1 ${sample}R1_val_1.fq.gz \
    -2 ${sample}R2_val_2.fq.gz
    2> ./bismark_lambda/${sample}.log
done