Question: Is "paired reads have different names" error due to ! at beginning of line in fastq
0
gravatar for jason.willer
3.3 years ago by
United States
jason.willer0 wrote:

I'm running a perl script clipPairedEnd.pl) which uses cutadapt to trim Illumina adapters from paired-end fastq files. I then use bwa aln, bwa sampe, and samtools view to generate aln.bam, this bam file has 248 lines. When I use the same process on the uncut fastq files I get 5M lines in the bam file. After some digging in my log files I found this 

[bwa_sai2sam_pe_core] print alignments... [bwa_sai2sam_pe_core] paired reads have different names: "HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898", "HWI-ST1293:246:HFG23ADXX:1:1101:9432:1843"

When I try to find this position in the fastq files (pre and post adapter cut) here is what I see

less R1.fastq

495 +

    496 #1=DDFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJFHIJHH>GIIIIJJIJJIGHHCEHFFFDBDFEEDDBB##############################################################################

    497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 1:N:0:TCGCAGG

    498 GGCTTTCCGGGTGTGTGTTTAAATTTTTTTTCTATTTAATAATGTTTTTTATTTGTGTTGTAGAATGCCAGAGGACTTGGATCTGAGCTAAAGGACAGTATTCCAGTTACTGAACTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACT

less R2.fastq

495 +

    496 #######################################################################################################################################################

    497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 2:N:0:TCGCAGG

    498 AGTTCAGTAACTGGAATACTGTCCTTTAGCTCAGATCCAAGTCCTCTGGCATTCTACAACACAAATAAAAAACATTATTAAATAGAAAAAAAATTTAAACACACACCCGGAAAGCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAG

 

Adapter trimmed fastq 

less R1.fastq

495 +

    496 !

    497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 1:N:0:TCGCAGG

    498 GGCTTTCCGGGTGTGTGTTTAAATTTTTTTTCTATTTAATAATGTTTTTTATTTGTGTTGTAGAATGCCAGAGGACTTGGATCTGAGCTAAAGGACAGTATTCCAGTTACTGAACT

less R2.fastq

 495 +

    496 #######################################################################################################################################################

    497 @HWI-ST1293:246:HFG23ADXX:1:1101:9277:1898 2:N:0:TCGCAGG

    498 AGTTCAGTAACTGGAATACTGTCCTTTAGCTCAGATCCAAGTCCTCTGGCATTCTACAACACAAATAAAAAACATTATTAAATAGAAAAAAAATTTAAACACACACCCGGAAAGCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAG

 

Can anyone tell me if the ! is causing "paired reads have different names" error message. If so any ideas on how to fix this? I find about 2000 lines that begin with ! in my adapter cut R1.fastq, none in R2.fastq?  

Here is my trimming command

clipPairedEnd.pl -m1 read1.fastq -m2 read2.fastq -o1 R1.fastq -o2 R2.fastq -a1 AGATCGGAAGAGCACACGTCTGAACTCCAGTC -a2 TCTAGCCTTCTCGCAGCACATCC -s1 R1.stat -s2 R2.stat

 

alignment • 2.7k views
ADD COMMENTlink modified 3.1 years ago by mark.ziemann1.1k • written 3.3 years ago by jason.willer0

Seeing lines 491-502 might be helpful for a little more context.  There's nothing obviously wrong with the files from what you have posted, although that exclamation point was not an original quality score, and the reads were trimmed to different lengths, which is odd.

ADD REPLYlink written 3.3 years ago by Brian Bushnell15k

I just realized you asked for line 491-502, this seems like quite a few lines. 

 

R1.FASTQ

499 +

    500 CCCFFFDFHHHDFGHHHIIJJJJJIJJJJJJJJIIJJIJJJJJJIJJJJHHFFFFDFEEDEEEEDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEECDEEDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDC

    501 @HWI-ST1293:246:HFG23ADXX:1:1101:10485:1959 1:N:0:TCGCAGG

    502 CATATGCATGGCCTGGCATTTCTAGAAGAGAACTACTCCCATCAGAATGCCAAGAAGATCGTGGCCACCCACCAGCTTCTTGGTGATGTGCAGAGAGTGATTGAGGTTCTGCATGGCCTGCAGCTCAAGATGAGCATCTTGCAGTAAGTGT

R2.fastq

 499 +

    500 CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHFFFFFFFEEEEEEDDDDDDDEDDDBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD09?CDDDCDD@AC:CCD>

    501 @HWI-ST1293:246:HFG23ADXX:1:1101:10485:1959 2:N:0:TCGCAGG

    502 GCTTTCCAATTTCTCAGATTTACTCAGCCCCCAGACCATGCCAAACAGACTGCTCCCAGCACTGCAGGTGCCACACTTACTGCAAGATGCTCATCTTGAGCTGCAGGCCATGCAGAACCTCAATCACTCTCTGCACATCACCAAGAAGCTG

 

Trimmed fastqs

R1.fastq

499 +

    500 CCCFFFDFHHHDFGHHHIIJJJJJIJJJJJJJJIIJJIJJJJJJIJJJJHHFFFFDFEEDEEEEDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDEECDEEDDDDDDDDDDD

    501 @HWI-ST1293:246:HFG23ADXX:1:1101:10485:1959 1:N:0:TCGCAGG

    502 CATATGCATGGCCTGGCATTTCTAGAAGAGAACTACTCCCATCAGAATGCCAAGAAGATCGTGGCCACCCACCAGCTTCTTGGTGATGTGCAGAGAGTGATTGAGGTTCTGCATGGCCTGCAGCTCAAGATGAGCATCTTGCAGTAAGTGT

R2.fastq

499 +

    500 CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHFFFFFFFEEEEEEDDDDDDDEDDDBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD09?CDDDCDD@AC:CCD>

    501 @HWI-ST1293:246:HFG23ADXX:1:1101:10485:1959 2:N:0:TCGCAGG

    502 GCTTTCCAATTTCTCAGATTTACTCAGCCCCCAGACCATGCCAAACAGACTGCTCCCAGCACTGCAGGTGCCACACTTACTGCAAGATGCTCATCTTGAGCTGCAGGCCATGCAGAACCTCAATCACTCTCTGCACATCACCAAGAAGCTG

ADD REPLYlink written 3.3 years ago by jason.willer0
0
gravatar for mark.ziemann
3.1 years ago by
mark.ziemann1.1k
Australia/Mebourne/Monash University
mark.ziemann1.1k wrote:

Skewer works really well for simultaneous adapter clipping and quality trimming of paired-end data. Here is a blog post on it.

ADD COMMENTlink written 3.1 years ago by mark.ziemann1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1229 users visited in the last hour