Question: BS-seq: Using single end vs paired end in Bismark when Mapping efficiency is low
1
gravatar for cosgrove.nicola
20 months ago by
cosgrove.nicola10 wrote:

Hi,

Ive read through several of the posts on seqanswers and biostars to try and figure out whats going on but perhaps someone may spot something that is not obvious to me!

I have bisulfite treated patient samples (primary breast cancer to metastasis in bone and primary breast cancer to metastasis in brain ) with sequencing done on the HiSeq2500.

I used

    trim_galore --paired --retain_unpaired --trim1

to trim the paired end reads in each read group.

Then I aligned to the bisulfite converted genome in paired end mode first using bismark and allowing up to one mismatch

bismark --quiet --bowtie2 --multicore 12 -n 1 /bismark_aln/genomes/hg19_bs/ -1 bone.lane7.R1_val_1.fq.gz -2 bone.lane7.R2_val_2.fq.gz

bismark output

As you can see in Paired End Mode I am getting 28.1%, 24.6% alignment for the bone sample and then 61.0% for the brain sample.

To see if the mapping efficiency increases aligning only Read 1 in the bone_Read Group 1 I aligned again in single end mode.

bismark --quiet --bowtie2 -n 1/bismark_aln/genomes/hg19_bs/ bone.lane7.R1_val_1.fq.gz &

The mapping efficiency increased to 66.0%.

I then decided to see if by adjusting the scoring function for mismatches based on sequence length could i get a better mapping efficiency in the paired end mode using an subset of 100,000 reads

bismark --quiet --bowtie2 --score_min L,0,-0.6 -u 100000 /bismark_aln/genomes/hg19_bs/ -1 bone_RG1/bone.lane7.R1_val_1.fq.gz -2 bone_RG1/bone.lane7.R2_val_2.fq.gz &

Mapping efficiency only increased from 28.1% to 38.7%

Thinking now that read 2's are problematic i decided to see what the mapping efficiency would be like if i aligned only Read 2 in single end mode using the --pbat flag and a subset of 100,000 reads

 bismark --quiet --bowtie2 --score_min L,0,-0.6 -u 100000 /bismark_aln/genomes/hg19_bs/ --pbat bone_RG1/bone.lane7.R2_val_2.fq.gz

Mapping efficiency was 57.0%.

So I know in order to get the most out of the data and to increase the mapping efficiency i should ideally be aligning in single end mode using read 1 and increasing the scoring function in order to account for indels but is it correct to do this?

Can i continue to the methylation call extraction using only read 1 alignment but use paired end alignment for other samples?

Or can you suggest anything else i can do to increase mapping efficiency in paired end mode?

Thanks!

ADD COMMENTlink modified 14 months ago by Biostar ♦♦ 20 • written 20 months ago by cosgrove.nicola10
1

Maybe try bwa-meth and see if that improves the paired-end alignment. You can continue with only read 1, but it'd be preferable to use both mates if you can fix the alignment issue.

BTW, you don't need to use --trim1 with bowtie2.

ADD REPLYlink written 20 months ago by Devon Ryan78k

We have a similar issue with data from a non-model organism, though the range of % reads aligned is from ~25-60%. I agree re: bwa-meth (we saw much better alignment results), though it's also worth pointing out this recent post from the Bismark developer:

https://sequencing.qcfail.com/articles/soft-clipping-of-reads-may-add-potentially-unwanted-alignments-to-repetitive-regions/

Bismark is a great aligner but is very strict re: concordant paired reads with end-to-end mapping and will not report anything else. I think this strictness is probably fine if you have very high-quality samples from a well-analyzed model organism, but for our group this isn't the norm (low conc samples from non-model organisms are common). I personally would rather have all results reported (good, ambiguous, unmapped) so I can assess problems with the samples, followed with some basic sane default cutoffs for additional QC and methylation calls.

ADD REPLYlink modified 20 months ago • written 20 months ago by Chris Fields1.9k
1

Yeah, I have a different take than Felix. I set the defaults in PileOMeth to be reasonable and avoid the situation he's talking about with repeat regions. Since the numbers I used were from bison, and bismark just ported my C code to perl, they could just use the same defaults. Methylation extraction is a lot faster than alignment, so I too would rather filter things after the fact.

ADD REPLYlink written 20 months ago by Devon Ryan78k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1161 users visited in the last hour