I have paired-end reads (gene-regions-exons) that contains 200 individuals and I want to include all the sample names in the read group. I have a text file in a single column and tried to include the readgroup when aligning
file=($(cat samples.txt))
bwa mem -M -R "@RG\tID:Library1\tSM:$file\tPL:Illumina\tLB:lib_2x250\tDS:hg19" hg19.ref R1.fastq.gz R2.fastq.gz > file.sam
But it doesn't work, the sam file generated includes only the last sample from the file samples.txt
.
Also, once the bam files have been generated, is it possible to split the files based on individuals?
Thank you... I understand the iteration part of it but I did not get the point: "(presumably you have path information in the samples.txt file)."
So, what I have is only one library of pooled samples that contain both cases and controls (200 in total). The sample list is only the ID names in one column. I wanted to add all the 200 names into the bam header and then separate the bams based on cases and controls.
Ah, I had presumed that you had multiple individual samples, not a bunch whose names you wanted to concatenate. Then either make an $RG variable to which you just append each line in the while loop (and then put bwa outside of the loop) or, better yet, just linearize samples.txt:
You could also comma separate things, if you prefer with
RG=`cat samples.txt | tr "\n" ","`
.Thanks but now it ends up adding everything in just one line several times:
What I intended was
this is useful... many thanks Devon!