I have 30 + bam files and I have merged the data using samtools merge(adding RG tag using -rh option).I ran mpileup on each of the .bam files separately and ran mpileup on the merged data to look at average coverage.Say if I have 10x average coverage in each of my individual samples,I am expecting a coverage of roughly 10x*n in the merged case where n is the number of samples .However,I see my average coverage on the merged bam dropping to roughly half the value and some of the bases included in individual samples pileup are not included in the merged bam pileup.I am not sure how samtools mpileup is interpreting the merged bam file especially with RG tags or I am wondering if the merge has been done properly or not.Any help would be appreciated.
Is your sequencing very high coverage? When you run
samtools mpileup, it should say something like:
[mpileup] 200 samples in 200 input files <mpileup> Set max per-file depth to 40
It only outputs the max depth message if it has to raise it from the default, or if you request a value that will result in high memory usage, like:
[mpileup] 200 samples in 200 input files (mpileup) Max depth is above 1M. Potential memory hog!
What does it say in your case? I think the limit will be
8000 / # samples, which will cause you problems if coverage gets above 266 in a sample (for
Try changing to higher values of
-d, or better yet, use
samtools depth. It will do most of the filtering you request in your example (it has
-Q), and it shouldn't have any depth constraints (since it doesn't call variants).
I believe the truncated file warning can be ignored and is an artifact from writing uncompressed BAM files. I think the latest version of
samtools (or the git version) fixes this.