bwa mem low depth and fluctuating plot
1
0
Entering edit mode
3.8 years ago
ieie ▴ 10

Hi all, I am using bwa mem in this pipeline

bwa mem -t 12 -M -B 4 ref.fas xxx.r1.fastq.gz xxx.r1.fastq.gz > xxx.sam
samtools view -h -b -S -@ 11 xxx.sam > xxx.bam
samtools sort xxx.bam > xxx.sorted.bam
samtools depth xxx.sorted.bam > xxx.bed


Then with R, I read the bed file as dataframe and I plot the data. I get very low depth and a plot that is really fluctuating.

does anybody know if there is a way to improve the depth? thanks a lot for any help.

bwa mem samtools • 936 views
1
Entering edit mode

Hello,

you can convert sam to bam and sort on the fly:

\$ bwa mem -t 12 -M ref.fas xxx.r1.fastq.gz xxx.r1.fastq.gz|samtools sort -@ 11 -o xxx.sorted.bam -


What do you mean by "very low depth"?

fin swimmer

0
Entering edit mode

when I check the depth at each position, how many times a base is covered as I understood, I see very low depth like from 20 to 300. I thought that the more the depth the more confidence I get in my data. Plus, the plot is really fluctuating while I would expect a more stable plot without many peaks.

0
Entering edit mode

0
Entering edit mode

Have you considered showing us that plot?

0
Entering edit mode

this is the plot that I get. thanks for any help!

0
Entering edit mode
3.8 years ago

The plot looks fine to me. What read depth are you expecting? From the graph, it appears to have a read depth of 100+, I think that is pretty decent. On the question of peaks, there will always be regions in the genome which will over amplify in PCR amplification (hence sequenced multiple times) step of library preparation, other reason could be that they represent repetitive regions. Have you considered doing a PCR duplicate removal using PICARD, Some of these peaks may disappear.

0
Entering edit mode

Thanks a lot for your answer! what also worries me is that there many positions where the depth is 0 and I am not sure if I could fix this problem or not. I would not expect many positions with 0 depth.

0
Entering edit mode

That is to be expected. There are always regions of the genome which cannot be sequenced. Your data looks perfectly reasonable assuming it was e.g. illumina data.