Hi Everyone, I got per sample per aligned BAM files for already published genomes of human populations. In Header section i saw RG record something like this:
@RG ID:LP6005441-DNA_A09 SM:LP6005441-DNA_A09
having information for only RGID and RGSM. But to reproduce the results with GATK best practices, i have to correctly assign RG information.
Each bam I have represents a single sample from a single library prep but they were run on multiple lanes as indicated from the read information, e.g.:
HS2000-630_102:4:2115:1889:70619 HS2000-630_102:3:2311:13151:38215 HS2000-630_102:2:2315:18670:41735
So. to correctly assign the RG information unique for group of reads for each lane, i want to split the per sample BAM files into multiple BAMs with respect to Flowcell lanes. so i can go through replacing the RG information and apply Markduplicates and BQSR procedures correctly.
I am new in this, Could you please suggest any tool or script in order to do my job?
Thanks in advance!