I am looking at the read name from the archival bam from the sequencing provider. It provides machine/ run / lane info but readgroup info isn't written in it.
What are my options if I want to extract the fastq from the bam to align with BWA to annotate the RG info so that it is used in downstream GATK calling? (kinda a reversal of the process to simulate the output of per lane fastq for alignment then per lane dedup and so on )
I have actually mapped the the sample (with reads from different lanes/ possibly different runs) to a reference already but I have 37 other samples so it would be less painful if i got it 'right' at the start. i.e. maybe an perl script to separate the fastq reads by the run/lane and dealing with each lane bam