Question: bam file readgroup to include several individuals
0
gravatar for P.NJ
5.0 years ago by
P.NJ50
Germany
P.NJ50 wrote:

I have paired-end reads (gene-regions-exons) that contains 200 individuals and I want to include all the sample names in the read group. I have a text file in a single column and tried to include the readgroup when aligning

file=($(cat samples.txt))
bwa mem -M -R "@RG\tID:Library1\tSM:$file\tPL:Illumina\tLB:lib_2x250\tDS:hg19" hg19.ref R1.fastq.gz R2.fastq.gz > file.sam

But it doesn't work, the sam file generated includes only the last sample from the file samples.txt .

Also, once the bam files have been generated, is it possible to split the files based on individuals ?

 

ADD COMMENTlink modified 5.0 years ago by Devon Ryan92k • written 5.0 years ago by P.NJ50
0
gravatar for Devon Ryan
5.0 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

You're not iterating over anything, so file will only ever be the last line in samples.txt. You want something more like (untested!):

while read file; do
    bwa mem -M -R "@RG\tID:Library1\tSM:$file\tPL:Illumina\tLB:lib_2x250\tDS:hg19" hg19.ref R1.fastq.gz R2.fastq.gz > file.sam
done < samples.txt

Of course, that'll overwrite the output file and perform the same alignment again and again, but I leave fixing that as an exercise to you (presumably you have path information in the samples.txt file).

For splitting a merged file by read group, there's probably something premade that can do it (either in picard tools or GATK). But if not, you can always use samtools with the -r option and then iterate over the read groups (it'd probably be faster to just write a custom tool to do this with pysam).

ADD COMMENTlink written 5.0 years ago by Devon Ryan92k

Thank you... I understand the iteration part of it but I did not get the point: "(presumably you have path information in the samples.txt file)."

So, what I have is only one library of pooled samples that contain both cases and controls (200 in total). The sample list is only the ID names in one column. I wanted to add all the 200 names into the bam header and then separate the bams based on cases and controls.

ADD REPLYlink written 5.0 years ago by P.NJ50
1

Ah, I had presumed that you had multiple individual samples, not a bunch whose names you wanted to concatenate. Then either make an $RG variable to which you just append each line in the while loop (and then put bwa outside of the loop) or, better yet, just linearize samples.txt:

RG=`cat samples.txt | tr -d "\n"`
bwa mem -M -R "@RG\tID:Library1\tSM:$RG\tPL:Illumina\tLB:lib_2x250\tDS:hg19" hg19.ref R1.fastq.gz R2.fastq.gz > file.sam

You could also comma separate things, if you prefer with RG=`cat samples.txt | tr "\n" ","`.

ADD REPLYlink written 5.0 years ago by Devon Ryan92k

Thanks but now it ends up adding everything in just one line several times:

@RG     ID:Library1 SM:NA19NA20NA21NA22... PL:Illumina  LB:lib_2x250 DS:hg19
@RG     ID:Library1 SM:NA19NA20NA21NA22... PL:Illumina  LB:lib_2x250 DS:hg19

..

What I intended was

@RG     ID:Library1 SM:NA19 PL:Illumina  LB:lib_2x250 DS:hg19
@RG     ID:Library1 SM:NA20 PL:Illumina  LB:lib_2x250 DS:hg19
@RG     ID:Library1 SM:NA21 PL:Illumina  LB:lib_2x250 DS:hg19
ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by P.NJ50

this is useful... many thanks Devon!

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Nandini830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1485 users visited in the last hour