I have a BAM file which consists of sequencing reads derived from pooling 25 individuals and sequencing the pool of DNA. Variant calling, or calculating minor allele frequencies, requires one to specify that the BAM file actually consists of 25 samples, rather than one. This is usually done by specifying read group headers in the BAM header and for each read. For samtools, it seems like specifying any read group to a read is fine, and we just need to make sure the BAM file contains 25 read groups in total.
If I am correct, this would not be the case if I would call variants using, for example, the GATK framework, as stated in their manual: "a read group is effectively treated as a separate run of the NGS instrument in tools like base quality score recalibration -- all reads within a read group are assumed to come from the same instrument run and to therefore share the same error model"
However, the bwa -R command you mention only allows us to add one read group to all reads in a BAM file. Since I have 25 pooled samples within the same BAM, this option would not be possible for me. This is similar for other read group add commands such as the one from the freebayes software: https://github.com/ekg/bamaddrg or the AddOrReplaceReadGroups command from Picard tools.
Therefore, two questions:
(a) How can I add multiple read groups to a single BAM file (simultaneously?), allowing me to do correct variant calling?
(b) Does it really matter whether I specify this for the `pileup` file? It most likely does when eventually calling variants, but the pileup file does no such thing.