Found Invalid Nucleotide Sequence
2
3
Entering edit mode
12.4 years ago
Love ▴ 100

Hello, I used samtools to generate a fastq file(consenus sequence). Then I used fastx to filter the quality. The command is:

fastq_quality_filter -i cns.fastq -o cns_Qual20.fastq -q 30 -p 80 -Q 33 -v

However I got an error:

fastq_quality_filter: found invalid nucleotide sequence (gaTCACAGGTCTATCACCCTATTAACCACTCACGGgagctctccatgcatttggtatttt) on line 2

The top lines in the sequence file like

@chrM
gaTCACAGGTCTATCACCCTATTAACCACTCACGGgagctctccatgcatttggtatttt
cgtttggggggtatgcacgcgatagcattgcgagacgctggagccggagcaccctatgtc
gcagtatctgtctttgattcctgcctcatcctattatttatcgcacctacgttcaatatt
acaggcgaacatacttactaaagtgtgttaattaattaatgcttgtaggacataataata
acaattgaatgtctgcacagccgctttccacacagacatcataacaaaaaatttccacca

Thanks for help.

fastq filter quality • 7.4k views
ADD COMMENT
1
Entering edit mode
12.4 years ago

[?]This page[?] says:

Some functions of FASTX-Toolkit do not work with FASTA-formatted sequences on multiple lines, thus it is sometimes necessary to transform the file so that fasta_formatter each sequence is on a single line.

Pierre is probably right. You need to reformat your fastq so it's in 4 lines.

Try using this script to reformat you fastq into 4 lines:

import sys

inFile = open(sys.argv[1],'r')

header = ''
seq = ''
qual = ''

seqs = False
quals = False
for line in inFile:
    if line[0] == "@":
        if header != '':
            print "@" + header
            print seq.upper()
            print "+" + header
            print qual

        header = line[1:].strip()
        seqs = True
        quals = False
        qual = ''
        seq = ''
    elif line[0] == "+":
        seqs = False
        quals = True
    else:
        if quals:
            qual += line.strip()
        if seqs:
            seq += line.strip()

print "@" + header
print seq
print "+" + header
print qual

Save as yourName.py. Use by:

python yourName.py yourFastaq.fastq > reformatted.fastq
ADD COMMENT
0
Entering edit mode

NameError: name 'sys' is not defined

ADD REPLY
0
Entering edit mode

sorry, my fault

ADD REPLY
0
Entering edit mode

I've changed the script to print out upper case sequence letters. Maybe it will help?

ADD REPLY
0
Entering edit mode
12.4 years ago

does fastq_quality_filter accepts the fastq files having more than 4 lines per records (name,seq,name2,qualitie)?

ADD COMMENT
0
Entering edit mode

I don't know. But in my previous thread The guy said that it is fine.

ADD REPLY
0
Entering edit mode

I convert it to 4 lines per records, still wrong. A very simple test file:

@chr1
gaTCACAGGTCTATCACCCTA
+chr1
efcfffffcfeefffcfffff

Then the error:

fastq_quality_filter: found invalid nucleotide sequence (gaTCACAGGTCTATCACCCTA) on line 2
ADD REPLY
0
Entering edit mode

Kind of a long shot, maybe it doesn't like lower case letters?

ADD REPLY
0
Entering edit mode

But does lower case have specific physical meaning?

ADD REPLY
0
Entering edit mode

And still wrong for upper case, did I download a wrong fastx?

ADD REPLY

Login before adding your answer.

Traffic: 2961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6