Ive read through several of the posts on seqanswers and biostars to try and figure out whats going on but perhaps someone may spot something that is not obvious to me!
I have bisulfite treated patient samples (primary breast cancer to metastasis in bone and primary breast cancer to metastasis in brain ) with sequencing done on the HiSeq2500.
trim_galore --paired --retain_unpaired --trim1
to trim the paired end reads in each read group.
Then I aligned to the bisulfite converted genome in paired end mode first using bismark and allowing up to one mismatch
bismark --quiet --bowtie2 --multicore 12 -n 1 /bismark_aln/genomes/hg19_bs/ -1 bone.lane7.R1_val_1.fq.gz -2 bone.lane7.R2_val_2.fq.gz
As you can see in Paired End Mode I am getting 28.1%, 24.6% alignment for the bone sample and then 61.0% for the brain sample.
To see if the mapping efficiency increases aligning only Read 1 in the bone_Read Group 1 I aligned again in single end mode.
bismark --quiet --bowtie2 -n 1/bismark_aln/genomes/hg19_bs/ bone.lane7.R1_val_1.fq.gz &
The mapping efficiency increased to 66.0%.
I then decided to see if by adjusting the scoring function for mismatches based on sequence length could i get a better mapping efficiency in the paired end mode using an subset of 100,000 reads
bismark --quiet --bowtie2 --score_min L,0,-0.6 -u 100000 /bismark_aln/genomes/hg19_bs/ -1 bone_RG1/bone.lane7.R1_val_1.fq.gz -2 bone_RG1/bone.lane7.R2_val_2.fq.gz &
Mapping efficiency only increased from 28.1% to 38.7%
Thinking now that read 2's are problematic i decided to see what the mapping efficiency would be like if i aligned only Read 2 in single end mode using the --pbat flag and a subset of 100,000 reads
bismark --quiet --bowtie2 --score_min L,0,-0.6 -u 100000 /bismark_aln/genomes/hg19_bs/ --pbat bone_RG1/bone.lane7.R2_val_2.fq.gz
Mapping efficiency was 57.0%.
So I know in order to get the most out of the data and to increase the mapping efficiency i should ideally be aligning in single end mode using read 1 and increasing the scoring function in order to account for indels but is it correct to do this?
Can i continue to the methylation call extraction using only read 1 alignment but use paired end alignment for other samples?
Or can you suggest anything else i can do to increase mapping efficiency in paired end mode?