Question: How to remove empty read from fastq
0
gravatar for cfarmeri
3.6 years ago by
cfarmeri150
Japan
cfarmeri150 wrote:

Hello,

I would like to remove empty reads from fastq_file after trimming adapter sequencing.
This fastq is from 454 GS-FLX.

I tried to remove that using following fastx_clipper(in fastx_toolkit)

fastx_clipper -Q33 -l 1 -i in.fastq -o out.fastq

But I received following error message:

Segmentation fault (core dumped)

Anybody has solution about this problem? Other software can remove these empty reads?

Thanks. 

 
software error • 3.2k views
ADD COMMENTlink modified 2.9 years ago by saamar.rajput10 • written 3.6 years ago by cfarmeri150

In the latest documentation, I can't find a Q flag for the fastx_clipper command. Maybe try removing that flag?

ADD REPLYlink written 2.9 years ago by James Ashmore2.6k
3
gravatar for arnstrm
3.6 years ago by
arnstrm1.7k
Ames, IA
arnstrm1.7k wrote:

If you just want to get rid of short sequences, you can use biowak

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'

edit: FIXED based on comment below!

 

ADD COMMENTlink modified 3.4 years ago • written 3.6 years ago by arnstrm1.7k

Thanks, it works well. The empty reads are removed.

But the head "@"characters of the read name line (line1) at each read were also removed.

So I couldn't FastQC these processed fastq...

ADD REPLYlink written 3.6 years ago by cfarmeri150
2

Oh yeah, I forgot. The command should include printing "@" before the name:

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'
ADD REPLYlink written 3.6 years ago by arnstrm1.7k

Thank you so much!!
I can get processed fastq file trimmed correctly.

ADD REPLYlink written 3.6 years ago by cfarmeri150
0
gravatar for saamar.rajput
2.9 years ago by
Germany
saamar.rajput10 wrote:

Hello,

I too want to remove the empty reads after the adapter trimming. Could you please elaborate your code

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'

I mean what part does what and where is the input file ?

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by saamar.rajput10
1

Hey,

bioawk works like a typical awk command but has been modified to understand some of the common ngs file formats (fasta, fastq, gff etc), hence we the -c flag (to consider the format as fastq). Like awk, it needs awk 'condition{action}' filename the condition here is length($seq) > 1 which means length of the sequence is greater than 1 the action here is {print "@"$name"\n"$seq"\n+\n"$qual}' which is to print the sequence back in fastq format (if the condition is satisfied). You supply the filename after you close the single quote as shown above.

PS: you should not ask a question in a existing thread. This should have been simply a followup comment in the above answer.

ADD REPLYlink written 2.9 years ago by arnstrm1.7k

Thank you so much Arnstrm.

ADD REPLYlink written 2.9 years ago by saamar.rajput10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1302 users visited in the last hour