Hi,
I'm using BBNorm to separate my reads by kmer depth, and I encountered what seemed to me to be strange behaviour. When I adjust the bin depths, I am getting inconsistent binning of reads, at least as far as I understand the aim of the program.
For instance, if I set a high bin depth at 200 and a low bin depth at 10, I will get a different number of reads in the high depth bin compared to when I set it at 200 and 100. Is that expected? My understanding is that the high depth reads (>200) should all be put in the highdepth output, so why would a different low depth setting change how many were high depth?
This leads to problems where I will get more reads using a 250 high depth bin than a 200 high depth bin because the low depth is different. I'll include the commands I ran to test this when I noticed that.
bbnorm.sh in1=trim_dedup_Anacalosa_R1.fastq.gz in2=trim_dedup_Anacalosa_R2.fastq.gz outhigh=high250_td_Anacalosa.fastq.gz outlow=low100_td_Anacalosa.fastq.gz outmid=mid_td_Anacalosa.fastq.gz passes=1 highbindepth=250 lowbindepth=100
Total reads in: 74158538 74.090% Kept
Low bin reads: 48960818 66.022%
Mid bin reads: 7333512 9.889%
High bin reads: 17864208 24.089%
bbnorm.sh in1=trim_dedup_Anacalosa_R1.fastq.gz in2=trim_dedup_Anacalosa_R2.fastq.gz outhigh=test1high.fastq.gz outlow=test1low.fastq.gz outmid=test1mid.fastq.gz passes=1 highbindepth=200 lowbindepth=100
Total reads in: 74158538 74.081% Kept
Low bin reads: 48959146 66.020%
Mid bin reads: 5925252 7.990%
High bin reads: 19274140 25.990%
bbnorm.sh in1=trim_dedup_Anacalosa_R1.fastq.gz in2=trim_dedup_Anacalosa_R2.fastq.gz outhigh=test2high.fastq.gz outlow=test2low.fastq.gz outmid=test2mid.fastq.gz passes=1 highbindepth=200 lowbindepth=10
Total reads in: 74158538 74.076% Kept
Low bin reads: 21759858 29.342%
Mid bin reads: 36943402 49.817%
High bin reads: 15455278 20.841%
bbnorm.sh in1=trim_dedup_Anacalosa_R1.fastq.gz in2=trim_dedup_Anacalosa_R2.fastq.gz outhigh=test3high.fastq.gz outlow=test3low.fastq.gz outmid=test3mid.fastq.gz passes=1 highbindepth=250 lowbindepth=10
Total reads in: 74158538 74.092% Kept
Low bin reads: 21728446 29.300%
Mid bin reads: 38034948 51.289%
High bin reads: 14395144 19.411%
Any idea why it is behaving this way or what I'm misunderstanding?
This is one of those questions which may need an input from Brian Bushnell to get an authoritative answer but the inline help says this:
Perhaps it is one of the reads from the pair that is affecting the output.
You should also look at the Normalization paramaters to see if you need to include something from there. Perhaps this
There is a bbnorm guide here in case you had not seen it.
Thanks for the response, genomax. I had taken a look over the guide and the help, but the behaviour remains unclear to me. I tried to use the
uselowerdepth=t
, but there was no effect between runs (=t or =f produced the same output).I wondered about your point regarding the pairs, but it still doesn't add up to me. If the high bin depth hasn't changed, then all the read pairs that are above it for both reads ought to stay there. Raising the low bin depth might mean that more pairs will have one high and one low, but they should still only go to the mid output and the high depth bin output should be unchanged.
I suggest that you email Brian directly (you can find his email in the BBMap documentation). Please come back and post an update if/when you hear from him.