Question: Samtools: merge and mpileup vs mpileup alone for variant-calling with multiple BAM
1
gravatar for Rob
5 months ago by
Rob80
Rob80 wrote:

Hi, I'm doing some variant-calling, and I was wondering what is the difference between:

merging BAM files with samtools merge and call samtools mpileup on the merged BAM

and

skip the merge step and call mpileup with all the BAM files?

I did both and expected the same results, but I got 15% of identical SNPs, 60% close to identical but lower quality with the samtools merge step, and the others SNPs are not find by one or the other method.

Is it possible that the quality filter applied in mpileup affect each file separately and cause this behaviour or is it something else?

ADD COMMENTlink written 5 months ago by Rob80
2

More coverage gives greater confidence in variant calls, and decreases the likelihood that only a single allele will be present for heterozygous variants. I would not expect merging several 100x coverage datasets to change the result much, but merging several 5x coverage datasets will yield an extremely large difference.

ADD REPLYlink modified 5 months ago • written 5 months ago by Brian Bushnell13k

But since the RG tags are identical for one sample, doesn't mpileup merge files itself? And so the coverage should be identical compared to the step with merge no?

ADD REPLYlink modified 5 months ago • written 5 months ago by Rob80

that is the part that i don't know if mpileup treats 2 sets of RG tags as if you have passed 2 separate files to mpileup, generating 1 variant call file, but with calls for each sample. Best way just to it until getting vcf and see how many individuals you get in the end

ADD REPLYlink written 5 months ago by stolarek.ir440

The strange things with this is that I have more coverage on a single SNPs without merge the files before. Here one example (same for all SNPs common in both methods) :

                   Ref   Alt     QUAL     GT     DP
Without merge:      T     C    48.0158    0/1    26
With merge   :      T     C    36.8363    0/1    23
ADD REPLYlink modified 5 months ago • written 5 months ago by Rob80

This VCF has only one sample? Hmmm, looks to me like something strange is going on.

ADD REPLYlink written 5 months ago by Brian Bushnell13k

Yep, only one sample (same RG tag for each Lane)

ADD REPLYlink written 5 months ago by Rob80

What you are describing should not happen. But at least, please make sure you are using the latest versions of all the software (I assume that would be samtools and bcftools); that can often resolve issues. It might also help if you could post the exact command lines you used in each case.

ADD REPLYlink written 5 months ago by Brian Bushnell13k

that reminds me also of my old problem a bit using samtools 1.19 and bcftools 1.2. Either use both of the 1.19 or both at 1.2 versions <- you can also experiment with that, and some crazy bat crap is happening sometimes

ADD REPLYlink written 5 months ago by stolarek.ir440
1

if you just did merge on BAMs, that just adds up coverage and mpileup has no idea, that it's using two files. Depending on how low the coverage was on individual files, the results will vary, though most of the SNP calls should not change <- but again coverage of the individual files

ADD REPLYlink written 5 months ago by stolarek.ir440

But what mpileup do with the 2 files informations? Since it's the same sample (with identical @RG tag), does not it merge them together too?

ADD REPLYlink written 5 months ago by Rob80
1

Never really did it, but it could treat them as separate samples if when merging you applied RG tags, and then didn't ignore them at mpileup, but you should google that, as I'm not sure

ADD REPLYlink written 5 months ago by stolarek.ir440

I will try more research, because yeah, that is the information I'm looking for :) So which way do you personally suggest as the best? Merge first sample with same RG tags, and do variant calling on the merged file?

ADD REPLYlink written 5 months ago by Rob80

anytime you have more data, and all previous steps are exactly the same <- you are not trying to compare anything between those 2 technical replicates, then just merge and call on that one big file like you did

ADD REPLYlink written 5 months ago by stolarek.ir440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 908 users visited in the last hour