Question about converting SAM to fastq and adding /1 /2 to the headers
2
0
Entering edit mode
9.1 years ago
p.migeon • 0

Hello,

I am attempting to align sequence reads (fastq) to the PhiX genome in order to remove reads that map to the PhiX genome before going into my assembly. I have performed the alignment using BWA, and then extracted the unmapped reads from the alignment by using this command:

samtools view -f4 aln_B51_2phix.sam > aln_B51_2phix.unmapped.sam

I then convert the reads to fastq using this command:

cat aln_B51_2phix.unmapped.sam | grep -v ^@ | awk 'NR%2==1 {print "@"$1"_1\n"$10"\n+\n"$11}' > unmapped/aln_B51_2phix_0.fastq

and

cat aln_B51_2phix.unmapped.sam | grep -v ^@ | awk 'NR%2==0 {print "@"$1"_1\n"$10"\n+\n"$11}' > unmapped/aln_B51_2phix_0.fastq

I have a script which typically adds /1 /2 to the headers, and I added these to the reads before I did the alignment, but after the alignment they are no longer included in the fastq headers. I am unable to add the /1 /2 headers to these fastq files, even though they should be in the same format as the input fastq files for the alignment at this point.

I need the headers to include the /1 /2 designation for forward and reverse reads in order to perform the assembly. Does anyone know if I am doing something incorrectly or if there is a much simpler way to do all this?

Thanks!

Assembly alignment • 7.4k views
ADD COMMENT
0
Entering edit mode

As a side note, BBMap can output fastq streams directly, making the process easier:

bbmap.sh ref=phix.fa in1=r1.fq in2=r2.fq outm1=mapped1.fq outm2=mapped2.fq outu1=clean1.fq outu2=clean2.fq

Though I generally use BBDuk for phiX removal instead, as it's faster:

bbduk.sh ref=phix.fa in1=r1.fq in2=r2.fq out1=clean1.fq out2=clean2.fq hdist=1

In each case the original read names are maintained. You can also simplify arguments like this:

bbduk.sh ref=phix.fa in=r#.fq out=clean#.fq hdist=1

...as long as the files are named 1 and 2.

ADD REPLY
2
Entering edit mode
9.1 years ago

Perhaps this might help you? But check the code first, since I did not test it.

samtools view -S -f4 aln_B51_2phix.sam | perl -F'\t' -ane 'chomp($F[10]); $mate=($F[1]&64)?"/1":"/2"; print "\@$F[0]$mate\n$F[9]\n+\n$F[10]\n";' | gzip -c > unmapped/aln_B51_2phix_0.fastq.gz
ADD COMMENT
0
Entering edit mode

Unmapped reads may be presented on the reverse strand, depending on mappers, so this command line may put some reads on the reverse strand.

ADD REPLY
1
Entering edit mode

Ok, I didn't know that. How can an unmapped read be on the reverse strand? And what mappers do give me these reversed unmapped reads in the bam?

ADD REPLY
0
Entering edit mode

SAM spec does not specify the strand of unmapped reads, so you cannot make this assumption. Some mappers put unmapped ends on the same strand as the mapped ends.

ADD REPLY
1
Entering edit mode

Hm... you're right. If one mate is mapped and the other mate not. Never thought about that issue. Thanks for clarifying it.

ADD REPLY
0
Entering edit mode

It should be noted that this is (finally) changing, see here.

ADD REPLY
2
Entering edit mode
9.1 years ago
Adrian Pelin ★ 2.6k

Might also be useful to know that samtools has an option to generate fastq reads from BAM files.

Usage: samtools bam2fq [-nO] [-s <outSE.fq>] <in.bam>

So you can always:

samtools view -f 4 -Sb INPUT.sam > UNMAPPED.bam
samtools bam2fq -s Single_end.fastq UNMAPPED.bam > Paired_end.fastq

This should generate your Paired_end.fastq file with paired end reads with /1 and /2 appended. And if it so happens that you have single end reads, it will output them in Single_end.fastq

ADD COMMENT
1
Entering edit mode

Nice one!

But it should be doable in one call:

samtools view -f4 -b INPUT.bam | samtools bam2fq - > Paired_end.fastq
ADD REPLY
1
Entering edit mode

Use "-u" for piping as it is faster.

ADD REPLY
0
Entering edit mode

Makes sense, the magic of command line! Thanks for the tip.

ADD REPLY
0
Entering edit mode

cool, I didn't know that sub-command.

ADD REPLY
0
Entering edit mode

I could swear it wasn't there a while ago, but I am glad they implemented it, I really like how it works.

ADD REPLY

Login before adding your answer.

Traffic: 2014 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6