So I am transitioning into using STAR as our primary aligner for RNA-seq workflow and was curious if there's a way to add RG without knowing what it is beforehand - primarily because I am working with publicly available TCGA fastq files. These were not sequenced in-house.
I see --outSAMattrRGline and --outSAMreadID under STAR options but 1) the file needs to be in SAM/BAM format to extract RG (?) but our premapped files are in fastq format and 2) to add RGID I need to manually load (have prior knowledge of the RGID of our samples).
I looked up the filename format for TCGA files but it does not involve specific read group info. I am wondering if there's a way to extract RGID from fastq file and automatically input into the aligner for downstream analysis using GATK.