I have paired-end reads (gene-regions-exons) that contains 200 individuals and I want to include all the sample names in the read group. I have a text file in a single column and tried to include the readgroup when aligning
file=($(cat samples.txt))
bwa mem -M -R "@RG\tID:Library1\tSM:$file\tPL:Illumina\tLB:lib_2x250\tDS:hg19" hg19.ref R1.fastq.gz R2.fastq.gz > file.sam
But it doesn't work, the sam file generated includes only the last sample from the file samples.txt.
Also, once the bam files have been generated, is it possible to split the files based on individuals?
Thank you... I understand the iteration part of it but I did not get the point: "(presumably you have path information in the samples.txt file)."
So, what I have is only one library of pooled samples that contain both cases and controls (200 in total). The sample list is only the ID names in one column. I wanted to add all the 200 names into the bam header and then separate the bams based on cases and controls.
Ah, I had presumed that you had multiple individual samples, not a bunch whose names you wanted to concatenate. Then either make an $RG variable to which you just append each line in the while loop (and then put bwa outside of the loop) or, better yet, just linearize samples.txt:
You could also comma separate things, if you prefer with
RG=`cat samples.txt | tr "\n" ","`.Thanks but now it ends up adding everything in just one line several times:
What I intended was
this is useful... many thanks Devon!