Question: Adding Multiple Read Group Headers to a single BAM file
0
gravatar for koen.vdberge
4.7 years ago by
koen.vdberge10
Belgium
koen.vdberge10 wrote:

I have a BAM file which consists of sequencing reads derived from pooling 25 individuals and sequencing the pool of DNA. Variant calling, or calculating minor allele frequencies, requires one to specify that the BAM file actually consists of 25 samples, rather than one. This is usually done by specifying read group headers in the BAM header and for each read. For samtools, it seems like specifying any read group to a read is fine, and we just need to make sure the BAM file contains 25 read groups in total.

If I am correct, this would not be the case if I would call variants using, for example, the GATK framework, as stated in their manual: "a read group is effectively treated as a separate run of the NGS instrument in tools like base quality score recalibration -- all reads within a read group are assumed to come from the same instrument run and to therefore share the same error model"

However, the bwa -R command you mention only allows us to add one read group to all reads in a BAM file. Since I have 25 pooled samples within the same BAM, this option would not be possible for me. This is similar for other read group add commands such as the one from the freebayes software: https://github.com/ekg/bamaddrg or the AddOrReplaceReadGroups command from Picard tools.

Therefore, two questions:

(a) How can I add multiple read groups to a single BAM file (simultaneously?), allowing me to do correct variant calling?

(b) Does it really matter whether I specify this for the `pileup` file? It most likely does when eventually calling variants, but the pileup file does no such thing.

samtools bam • 1.6k views
ADD COMMENTlink modified 4.6 years ago by Chris Fields2.1k • written 4.7 years ago by koen.vdberge10
1

how do you know which read n°1 should be assigned to the group 'g1' and  read n°2 should be assigned to the group 'g2'  and ... etc...  ?

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum128k

You basically don't. I think all that matters is specifying that it's a pooled sample, because this blows up the variability in assessing allele frequencies.

ADD REPLYlink written 4.7 years ago by koen.vdberge10
1

Yeah, that's not going to work. Each alignment itself needs to be associated to a particular read group, otherwise they'll just get ignored.
 

ADD REPLYlink written 4.7 years ago by Devon Ryan95k

What about just modifying the ploidy settings?

ADD REPLYlink written 4.7 years ago by Devon Ryan95k

Thank you for your answer. Adjusting ploidy settings is not possible within samtools mpileup, but maybe with bcftools?

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by koen.vdberge10

Possibly, I know GATK has a ploidy option, don't know about bcftools.

ADD REPLYlink written 4.6 years ago by Devon Ryan95k
1
gravatar for Chris Fields
4.6 years ago by
Chris Fields2.1k
University of Illinois Urbana-Champaign
Chris Fields2.1k wrote:

AFAIK, you can't add multiple read groups to one BAM file using any tool I know of, for the reasons Devon and Pierre mention.  The typical path is to align the samples independently (with read groups added at that step, such as using the 'bwa -R' option, or just after using one of the above tools).  The independent alignments are then post-processed (sorted, realigned, etc) and merged prior to variant calling with the read groups in place already.

ADD COMMENTlink written 4.6 years ago by Chris Fields2.1k

Thank you for your answer. However, in your approach, there is no cost efficiency of pooling samples, if I am right. I only have one FASTA file for the entire pooled sample.

ADD REPLYlink written 4.6 years ago by koen.vdberge10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 885 users visited in the last hour