Question: paired reads have different names (bwa-mem)
3
gravatar for AISHA
16 months ago by
AISHA40
Beijing
AISHA40 wrote:

Hi, I am experiencing a problem while running BWA mem on paired end fastq file downloaded from NCBI SRA. When I ran BWA-mem it gives an error like:

[mem_sam_pe] paired reads have different names: "SRR3239806.1.1", "SRR3239806.1.2"

Example Fastq file:

@SRR3239806.1.1 1 length=100 TTGTGTAGGGTGGGTAGGCTCCATGTTTCCCAGCAAAGCTGGAGACATACAGACTACCTGGTGTTACATTTATTTCAGTGCCTCCTGAGTGTCTCTAAAT +SRR3239806.1.1 1 length=100 B@CDDFEFFHFFFI@GHIJGIJJJIJIIJJJGGHHIIIJJJJHIIGEHGIJIIIEGIGGHI@A=AHHFDEFFFFFEDEEECCDDDDD3<@CCCDDAC@CC

@SRR3239806.1.2 1 length=100

@SRR3239806.2.1 2 length=100

@SRR3239806.2.2 2 length=100

(I've just pasted the headers for the sake of brevity.) Can anyone explain how can I fix this error?

next-gen • 2.3k views
ADD COMMENTlink modified 15 days ago by Sasha Fokin80 • written 16 months ago by AISHA40

Is it actually an error or just a warning? Did you download the fastq files or convert them with SRAtools?

ADD REPLYlink written 16 months ago by Devon Ryan84k

Its an error. I downloaded fastq file directly. It was a single file.

ADD REPLYlink written 16 months ago by AISHA40
2
gravatar for mmfansler
15 months ago by
mmfansler220
MSKCC | New York, NY
mmfansler220 wrote:

It appears that when the FASTQ file was dumped from the SRA file, the -I | --readids option was used in fastq-dump. BWA requires that paired reads have completely identical read names, so this option isn't compatible.

You could process the file(s) to remove those appended .(1|2)s,

sed -E "s/^((@|\+)SRR[^.]+\.[^.]+)\.(1|2)/\1/" SRR3239806.fastq > SRR3239806.fixed.fastq

or you could rerun the dump from SRA to FASTQ (which could be just as fast if the SRA is cached):

fastq-dump --split-files SRR3239806

or, if you'd like to keep working with an interleaved file:

fastq-dump --split-spot SRR3239806
ADD COMMENTlink modified 5 months ago • written 15 months ago by mmfansler220
1
gravatar for genomax
16 months ago by
genomax55k
United States
genomax55k wrote:

It appears that those reads are interleaved in the file you downloaded.

I suggest you download the fastq files directly from EBI-ENA where you will find the two reads (R1/R2) in separate files.

ADD COMMENTlink written 16 months ago by genomax55k
1

Interleaved files are not a problem for BWA - that's what the -p flag is for.

ADD REPLYlink modified 15 months ago • written 15 months ago by mmfansler220

Yes! I downloaded the interleaved fastq file. Isn't there any method to remove the above-mentioned error in the file?

ADD REPLYlink written 16 months ago by AISHA40

Reads you downloaded are using modified SRA headers (if you used fastq-dump to get the data you should have used the -F option to retrieve original Illumina headers. You could mess with the file you have but I suggest that you get the fastq's from ENA or do a new fastq-dump.

ADD REPLYlink written 16 months ago by genomax55k
0
gravatar for Sasha Fokin
15 days ago by
Sasha Fokin80
Russia
Sasha Fokin80 wrote:

This is also possible when the fourth and second lines (in fastq file) has differ length. In this case, the Line1 (sequence identifier) of the fastq can be correct

But the program will return the same error:

[mem_sam_pe] paired reads have different names: "@E00576:153:HK75TCCXY:2:1101:23470:1713"

for example

first read (line 2 and line 4 has differ length - and that's the problem):

@E00576:153:HK75TCCXY:2:1101:23470:1713 2:N:0:NTTACTCG+AGGCTATA
AATAATAATAAAATAAAATAATGTGCTATAAGGTCTTATTTGCAAGCTTCATGGTAGCCTCAATTAAACAAACCTGCAAACAAAAAATAAAAAATAAAAA
+
JJJJFFJJFFJJFJJJJJJJJJJJJFJFJJAFFJJJJFJJAFJJFAJF<JJJF<--7A<F<F7FJJFJJJJFAFA)7<<F<---7-<7AFJJFJ<<FFJ

second read (ok):

@E00576:153:HK75TCCXY:2:1101:23470:1713 1:N:0:NTTACTCG+AGGCTATA
GCAGGCTTCTGTGAAGGTGATTTTCTCTGGTGGAATGTTTTAATTTCCTGCTTTTTATTTTTTTTTTCTTGGTTGCAGTTTTGTTTAATTGAGGATACCATGAAGTTTGCAAATAAGACCTTATAGCATTTTATTTTATTTTATTATTAT
+
AAAFFJJJJJJJJFAFJJJ-<-FFJFJFJJJFJJJJJFJ<F-JA-J-FFJJJJJ-F<-FJJJJ<JFJF<JFAAF-A-F-AAJ<FFJA-<A--A-7AF---7<-77-7FJ7<7FJJ<AJA--<FA<-7---7J7AJAJ-<FFA-7FAAFAF
ADD COMMENTlink written 15 days ago by Sasha Fokin80

That is not true in my experience. When using these two lines with bwa-mem 0.7.16a, it returns

[W::bseq_read] the 1st file has fewer sequences.

ADD REPLYlink written 15 days ago by ATpoint7.4k

@ATpoint, it's interesting, maybe I did not use the most stable version: 0.7.17-r1194-dirty

ADD REPLYlink written 15 days ago by Sasha Fokin80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1448 users visited in the last hour