Question

Found Invalid Nucleotide Sequence

3

Entering edit mode

12.4 years ago

Love ▴ 100

Hello, I used samtools to generate a fastq file(consenus sequence). Then I used fastx to filter the quality. The command is:

fastq_quality_filter -i cns.fastq -o cns_Qual20.fastq -q 30 -p 80 -Q 33 -v

However I got an error:

fastq_quality_filter: found invalid nucleotide sequence (gaTCACAGGTCTATCACCCTATTAACCACTCACGGgagctctccatgcatttggtatttt) on line 2

The top lines in the sequence file like

@chrM
gaTCACAGGTCTATCACCCTATTAACCACTCACGGgagctctccatgcatttggtatttt
cgtttggggggtatgcacgcgatagcattgcgagacgctggagccggagcaccctatgtc
gcagtatctgtctttgattcctgcctcatcctattatttatcgcacctacgttcaatatt
acaggcgaacatacttactaaagtgtgttaattaattaatgcttgtaggacataataata
acaattgaatgtctgcacagccgctttccacacagacatcataacaaaaaatttccacca

Thanks for help.

fastq filter quality • 7.4k views

ADD COMMENT • link updated 9.4 years ago by Biostar 20 • written 12.4 years ago by Love ▴ 100

Ram · Answer 1 · 2011-11-29

1

Entering edit mode

12.4 years ago

Damian Kao 16k

[?]This page[?] says:

Some functions of FASTX-Toolkit do not work with FASTA-formatted sequences on multiple lines, thus it is sometimes necessary to transform the file so that fasta_formatter each sequence is on a single line.

Pierre is probably right. You need to reformat your fastq so it's in 4 lines.

Try using this script to reformat you fastq into 4 lines:

import sys

inFile = open(sys.argv[1],'r')

header = ''
seq = ''
qual = ''

seqs = False
quals = False
for line in inFile:
    if line[0] == "@":
        if header != '':
            print "@" + header
            print seq.upper()
            print "+" + header
            print qual

        header = line[1:].strip()
        seqs = True
        quals = False
        qual = ''
        seq = ''
    elif line[0] == "+":
        seqs = False
        quals = True
    else:
        if quals:
            qual += line.strip()
        if seqs:
            seq += line.strip()

print "@" + header
print seq
print "+" + header
print qual

Save as yourName.py. Use by:

python yourName.py yourFastaq.fastq > reformatted.fastq

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Damian Kao 16k

0

Entering edit mode

NameError: name 'sys' is not defined

ADD REPLY • link 12.4 years ago by Zhshqzyc ▴ 520

0

Entering edit mode

sorry, my fault

ADD REPLY • link 12.4 years ago by Zhshqzyc ▴ 520

0

Entering edit mode

I've changed the script to print out upper case sequence letters. Maybe it will help?

ADD REPLY • link 12.4 years ago by Damian Kao 16k

Ram · Answer 2 · 2011-11-29

0

Entering edit mode

12.4 years ago

Pierre Lindenbaum 161k

does fastq_quality_filter accepts the fastq files having more than 4 lines per records (name,seq,name2,qualitie)?

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I don't know. But in my previous thread The guy said that it is fine.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Love ▴ 100

0

Entering edit mode

I convert it to 4 lines per records, still wrong. A very simple test file:

@chr1
gaTCACAGGTCTATCACCCTA
+chr1
efcfffffcfeefffcfffff

Then the error:

fastq_quality_filter: found invalid nucleotide sequence (gaTCACAGGTCTATCACCCTA) on line 2

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Love ▴ 100

0

Entering edit mode

Kind of a long shot, maybe it doesn't like lower case letters?

ADD REPLY • link 12.4 years ago by Damian Kao 16k

0

Entering edit mode

But does lower case have specific physical meaning?

ADD REPLY • link 12.4 years ago by Love ▴ 100

0

Entering edit mode

And still wrong for upper case, did I download a wrong fastx?

ADD REPLY • link 12.4 years ago by Love ▴ 100