Question: Aligning paired end fastq files dumped from SRA
8
gravatar for Zev.Kronenberg
3.6 years ago by
United States
Zev.Kronenberg11k wrote:

Greetings,

 

 

 

 

 

I've downloaded a Short Read Archive (SRA) experiment and dumped it to fastq.

~/tools/sratoolkit.2.4.2-centos_linux64/bin/fastq-dump -I  --split-files --gzip SRR1514952/SRR1514952.sra​

BWA mem is throwing and error when I'm aligning the mate pairs:

[mem_sam_pe] paired reads have different names: "SRR1514950.1.1", "SRR1514950.1.2"
[mem_sam_pe] paired reads have different names: "SRR1514950.2.1", "SRR1514950.2.2"
[mem_sam_pe] paired reads have different names: "SRR1514950.3.1", "SRR1514950.3.2"

I'm checking that the files aren't truncated and contain the same number of reads.  Has anyone run into this problem before?  

 

 

paired bwa mem fastq sra • 3.9k views
ADD COMMENTlink modified 6 months ago by seelament0 • written 3.6 years ago by Zev.Kronenberg11k
9
gravatar for Zev.Kronenberg
3.6 years ago by
United States
Zev.Kronenberg11k wrote:

This seemed to work.  Just need to ask for the original read format. 

 

 ~/tools/sratoolkit.2.4.2-centos_linux64/bin/fastq-dump  --origfmt -I  --split-files --gzip SRR1514950/SRR1514950.sra

ADD COMMENTlink written 3.6 years ago by Zev.Kronenberg11k
3
gravatar for Adrian Pelin
3.6 years ago by
Adrian Pelin2.1k
Canada
Adrian Pelin2.1k wrote:

it's probably because this isn't the default way paired reads are usually named so bwa is confused. Try a quick sed:

sed -i 's,.1,/1,g' file1 and sed -i 's,.2,/2,g' file2. You will however need to manually fix the first read in file1 and second read in file2 to

SRR1514950.1/1 and SRR1514950.2/2

Hope this works.

ADD COMMENTlink written 3.6 years ago by Adrian Pelin2.1k
0
gravatar for Christian
16 months ago by
Christian2.6k
Cambridge, US
Christian2.6k wrote:

The following command worked for me:

cat sra.fq | perl -ne 's/\.([12]) /\/$1 /; print $_' > sra.fix.fq
ADD COMMENTlink written 16 months ago by Christian2.6k
0
gravatar for seelament
6 months ago by
seelament0
seelament0 wrote:

I had something similar. The reads I got from SRA look like so:

@SRR1531517.4.1 D3NH4HQ1:58:D091WACXX:7:1101:1448:2140 length=75
AACTTCCAGTGGAAATGAGATTCTGATTCTACCAAAAATGGCCCTCCGAATAGTCAGCATGTAGTTTGTTTGCCC
+SRR1531517.4.1 D3NH4HQ1:58:D091WACXX:7:1101:1448:2140 length=75
CCCFFFFFHHHHGIJIJIJJJJJJJJJJIJJJJJJIJJIJJIGIGIJJIIJIIIIIIJJJJIGIJJJIIJJJHHH

I tried something like this to make it compatible with BWA. It works with both forward and reverse files. I prefer to pipe (and zip) it to another file to keep the original as a backup.

sed 's;@SRR1531517\.\([0-9.]*\)\([0-9]\) \([a-zA-Z:0-9]*\) length=[0-9]*;@\3/\2;' sra.fq | gzip > sra.fix.fq.gz

Which gives me:

@D3NH4HQ1:58:D091WACXX:7:1101:1448:2140/1
AACTTCCAGTGGAAATGAGATTCTGATTCTACCAAAAATGGCCCTCCGAATAGTCAGCATGTAGTTTGTTTGCCC
+SRR1531517.4.1 D3NH4HQ1:58:D091WACXX:7:1101:1448:2140 length=75
CCCFFFFFHHHHGIJIJIJJJJJJJJJJIJJJJJJIJJIJJIGIGIJJIIJIIIIIIJJJJIGIJJJIIJJJHHH
ADD COMMENTlink written 6 months ago by seelament0

If you had chosen -F option while fastq-dumping the reads you would not have had to do this transformation. You will have recovered original Illumina format fastq headers.

ADD REPLYlink written 6 months ago by genomax49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1676 users visited in the last hour