Hello, I have a set of BAM files that were generated using the DRAGEN Somatic analysis pipeline. The BAM file read groups are formatted as below:
@RG ID:index.1  LB:UnknownLibrary   PU:1    SM:sampleID
@RG ID:index.2  LB:UnknownLibrary   PU:2    SM:sampleID
@RG ID:index.3  LB:UnknownLibrary   PU:3    SM:sampleID
@RG ID:index.4  LB:UnknownLibrary   PU:4    SM:sampleID
@RG ID:index.5  LB:UnknownLibrary   PU:5    SM:sampleID
@RG ID:index.6  LB:UnknownLibrary   PU:6    SM:sampleID
@RG ID:index.7  LB:UnknownLibrary   PU:7    SM:sampleID
@RG ID:index.8  LB:UnknownLibrary   PU:8    SM:sampleID
I want to update all the PU fields to include the flow cell ID while maintaining the other information as well as the lane indications like this:
@RG ID:index.1  LB:UnknownLibrary   PU:XXXXXXX.1    SM:sampleID
@RG ID:index.2  LB:UnknownLibrary   PU:XXXXXXX.2    SM:sampleID
@RG ID:index.3  LB:UnknownLibrary   PU:XXXXXXX.3    SM:sampleID
@RG ID:index.4  LB:UnknownLibrary   PU:XXXXXXX.4    SM:sampleID
@RG ID:index.5  LB:UnknownLibrary   PU:XXXXXXX.5    SM:sampleID
@RG ID:index.6  LB:UnknownLibrary   PU:XXXXXXX.6    SM:sampleID
@RG ID:index.7  LB:UnknownLibrary   PU:XXXXXXX.7    SM:sampleID
@RG ID:index.8  LB:UnknownLibrary   PU:XXXXXXX.8    SM:sampleID
I am struggling to find a quick way to do this via samtools or picard but will be happy to be proven wrong. Feel free to point me to existing answers for this specific problem, I have not found one yet.
TIA for the help.
is this onboard or basespace or ICA? dragen