Fastq_quality_filter: found invalid nucleotide sequence
1
0
Entering edit mode
6.1 years ago

I am trying to filter reads(Illumina data - RNA-Seq) based on the quality of 30 using the fastx toolkit. But I got an error like this. fastq_quality_filter: found invalid nucleotide sequence (GCGGAGWAACCGTTCGGCEACCAGGTGGCATCGCCGCCGAGGGWGCTCCCGTGGCGCGGGCAGTCGTTGACGAACATCTC) on line 85766.

How to resolve this error?

Thanks in advance.

RNA-Seq Assembly ngs • 2.1k views
ADD COMMENT
0
Entering edit mode

Sequence contains 'W' (and maybe other) character(s) that might be causing the error. You can try other tools such as fastqc.

ADD REPLY
1
Entering edit mode

'W' should be an allowed character, since it encodes for the weak bases (A or T). I've never seen an 'E'; this might be the problem.

There are many recent alternatives to the quite old fastx tool-kit. Just to name a few: bbduk, trimgalore, or trimmomatic.

ADD REPLY
0
Entering edit mode

How did you get ambiguous codes in your raw RNAseq data? What technology is this data from and has it been pre-processed in some fashion?

ADD REPLY
0
Entering edit mode
6.1 years ago
egeulgen ★ 1.3k

Your sequence seems to contain ambiguity codes. Simply remove those and you should be fine

ADD COMMENT
0
Entering edit mode

You can have a look at the ambiguity codes here

ADD REPLY
0
Entering edit mode

Ok, egeulgen, I will try and let you know.

ADD REPLY

Login before adding your answer.

Traffic: 2382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6