I have a dataset of 81 samples that are missing read group information. This leads to problems when trying to use GATK commands. I've found picard tools' AddOrReplaceReadGroups to be a helpful tool for adding read group to a SAM file. But I'm having trouble trying to loop through this command for all my files.
I've been able to come up with this, but there seems to be an issue with my notation in my variables;
while read f1 do FileName=$(echo $f1 | cut -d"," -f1) SampleID=$(echo $f1 | cut -d"," -f3) RowNum=$(grep -n $FileName.bam) java -jar picard.jar AddOrReplaceReadGroups \ I=$FileName.bam \ O=$FileName"_sort.bam" \ SORT_ORDER=coordinate \ RGID=$SampleID \ RGLB=WGS \ RGPL=illumina \ RGPU=$RowNum \ RGSM=$FileName done < file_with_sample_information.csv
Note: the input file
file_with_sample_information.csv has 4 columns, the 1st has the sample name and the 3rd has the database (ENA) ascension number.
How can I get this too work? I'd like to avoid manually doing this for 81 files.