Question

How to remove empty read from fastq

0

Entering edit mode

8.8 years ago

cfarmeri ▴ 210

Hello,

I would like to remove empty reads from fastq_file after trimming adapter sequencing. This fastq is from 454 GS-FLX.

I tried to remove that using following fastx_clipper(in fastx_toolkit)

fastx_clipper -Q33 -l 1 -i in.fastq -o out.fastq

But I received following error message:

Segmentation fault (core dumped)

Anybody has solution about this problem? Other software can remove these empty reads?

Thanks.

software-error • 7.3k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.8 years ago by cfarmeri ▴ 210

0

Entering edit mode

In the latest documentation, I can't find a Q flag for the fastx_clipper command. Maybe try removing that flag?

ADD REPLY • link 8.1 years ago by James Ashmore ★ 3.4k

Ram · Answer 1 · 2015-07-26

3

Entering edit mode

8.8 years ago

arnstrm ★ 1.8k

If you just want to get rid of short sequences, you can use biowak

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'

Edit: FIXED based on comment below!

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.8 years ago by arnstrm ★ 1.8k

0

Entering edit mode

Thanks, it works well. The empty reads are removed.

But the head @ characters of the read name line (line1) at each read were also removed.

So I couldn't FastQC these processed fastq...

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.8 years ago by cfarmeri ▴ 210

2

Entering edit mode

Oh yeah, I forgot. The command should include printing @ before the name:

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.8 years ago by arnstrm ★ 1.8k

0

Entering edit mode

Thank you so much!!

I can get processed fastq file trimmed correctly.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.8 years ago by cfarmeri ▴ 210

0

Entering edit mode

I think we can add $comment if there are comments in your fastq files for specifying read number etc:

bioawk -cfastx 'length($seq) > 1 {print "@"$name" "$comment"\n"$seq"\n+\n"$qual}'

ADD REPLY • link updated 18 months ago by Ram 43k • written 18 months ago by Dimas • 0

0

Entering edit mode

Hello,

I too want to remove the empty reads after the adapter trimming. Could you please elaborate your code

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'

I mean what part does what and where is the input file ?

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.1 years ago by saamar.rajput ▴ 70

1

Entering edit mode

Hey,

bioawk works like a typical awk command but has been modified to understand some of the common ngs file formats (fasta, fastq, gff etc), hence we the -c flag (to consider the format as fastq). Like awk, it needs awk 'condition{action}' filename.

The condition here is length($seq) > 1 which means length of the sequence is greater than 1

The action here is {print "@"$name"\n"$seq"\n+\n"$qual}' which is to print the sequence back in fastq format (if the condition is satisfied).

You supply the filename after you close the single quote as shown above.

PS: you should not ask a question in a existing thread. This should have been simply a followup comment in the above answer.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.1 years ago by arnstrm ★ 1.8k

0

Entering edit mode

Thank you so much Arnstrm.

ADD REPLY • link 8.1 years ago by saamar.rajput ▴ 70