Question: BWA mem minimum read length
2
gravatar for yolek64754
29 days ago by
yolek6475430
yolek6475430 wrote:

Hi everyone,

I am currently mapping read from different read length (from 30 to 100) and I have been using BWA-mem and BWA-aln. I know that BWA-aln works better for those read's length but while I was analysing my bam files I realised that every reads < 35bp are set as "unmapped" by BWA-mem. So here is my question:

Does the reads shorter than 35bp does not map because BWA-mem is not optimised for those reads ? If yes, why why some of 37-39 would map? (I have around 50K reads < 35bp and 100K < 40bp)

or

Is there, somewhere in the code, a limit of read length where BWA-mem doesn't even try to align the reads ? Would be good to find a proof in the code but didn't find anything about that in the code (https://github.com/lh3/bwa). My C skills are not the best tho' so I might have missed it.

Thanks a lot !

bwa mem alignment short dna • 139 views
ADD COMMENTlink written 29 days ago by yolek6475430
1

There is no hard cutoff in the code afaik. It is probably a combination of the alignment score with respect to other possible locations that the reads could map to. The shorter the read, the higher the probability that they multimap. Parameters in mem are optimized for reads > 70bp. So you have 150k very short reads, do they even matter in the bigger picture? You typically have millions of reads, do you? Why not just ignoring them?

ADD REPLYlink modified 29 days ago • written 29 days ago by ATpoint41k

Thanks for the quick answer. Yeah I don't use them in my analysis, I was just curious of why some reads of 36-38 bp mapped where none of 35 and below does. I was looking at the code and couldn't find any strict cutoff value so I was kinda lost but yeah I think it is probably some quality score becoming way too low to map them. Thanks !

ADD REPLYlink written 29 days ago by yolek6475430

Have you checked whether the 35 bp reads are all Ns? bcl2fastq, the program most commonly used to generate fastq files from raw bcl, by default resets reads too short after adtapter trimming to a strech of 35 Ns. This could explain the 35 to 36 bp gap.

ADD REPLYlink written 28 days ago by dariober11k

Hi, I didn't mentioned it but the reads I am mapping are simulations and hence are not Ns. Thanks a lot, that could be an explanation for empirical data.

ADD REPLYlink written 18 days ago by yolek6475430

If you can upload a representative sample of reads, say a few hundred thousand, to e.g. dropbox along with the reference genome (either the genome itself or a link to it) maybe some of us can take a look at the issue.

ADD REPLYlink written 18 days ago by dariober11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1290 users visited in the last hour