Hi, every friends As described, I have 50 mapped big .bam files (human exome,50 individual, 3GB average) which have no RGs. So, I want to use Picard AddOrReplaceReadGroups to add RGs. The Question:
1:For each,for example, RGID=(1,2,3..50) RGLB=(Lb.1,2,3..50) ( RGPL=ILLUMINA RGSM=(Tibet1,2,3..50). Is my operation on adding different RG to .bam file RIGHT?
2:After geting 50 new RG-adding bams, I will use GATK to do the Base quality score recalibration and Local realignment. Should I do this 50times for every bam?? Or can I merge the 50 bams into a sigle one to do this or the downstream analysis like SNP calling? If can, how to merge and what's the Notice?
And can sb tell me how The 1000 Genomes do this? As this project has large amounts of data.
3:If not, it means I must get other 100 new bams, 400-500 GB total, 50 in GATK -T TableRecalibration and 50 in IndelRealigner process in BQSR and Local realignment.The computational cost is extreme sad. Is there another way?
4:Like the process BQSR and Local realignment and VQSR, we have known vcf to use in human, but if the data comes from other species which have no known vcf, then how can I process these parameter(example:-knownsite)??
Appreciate your timely reply! Thanks!