fasta to fastq without quality scores
2
0
Entering edit mode
5.5 years ago

is it possible to convert fasta to fastq format without quality scores? if not how one can get quality scores of fasta sequences already in the genebank? i am retrieving sequences of clone libraries which are longer than HTP sequences and available only in fasta format. i have to process these files in QIIME pipeline

sequence next-gen sequencing rna-seq • 11k views
ADD COMMENT
3
Entering edit mode
5.5 years ago
ATpoint 82k

Seqtk can do this, here using # as the fake quality score:

seqtk seq -F '#' in.fa > out.fq
ADD COMMENT
0
Entering edit mode

Seem seqtk seq has no -F option:

  • seqtk seq -F seq: invalid option -- 'F'

Usage: seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT mask bases with quality lower than INT [0]

     -X INT    mask bases with quality higher than INT [255]
     -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
     -l INT    number of residues per line; 0 for 2^32-1 [0]
     -Q INT    quality shift: ASCII-INT gives base quality [33]
     -s INT    random seed (effective with -f) [11]
     -f FLOAT  sample FLOAT fraction of sequences [1]
     -M FILE   mask regions in BED or name list FILE [null]
     -L INT    drop sequences with length shorter than INT [0]
     -c        mask complement region (effective with -M)
     -r        reverse complement
     -A        force FASTA output (discard quality)
     -C        drop comments at the header lines
     -N        drop sequences containing ambiguous bases
     -1        output the 2n-1 reads only
     -2        output the 2n reads only
     -V        shift quality by '(-Q) - 33'
     -U        convert all bases to uppercases
     -S        strip of white spaces in sequences
ADD REPLY
0
Entering edit mode
Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
         -Q INT    quality shift: ASCII-INT gives base quality [33]
         -s INT    random seed (effective with -f) [11]
         -f FLOAT  sample FLOAT fraction of sequences [1]
         -M FILE   mask regions in BED or name list FILE [null]
         -L INT    drop sequences with length shorter than INT [0]
         -F CHAR   fake FASTQ quality []
         -c        mask complement region (effective with -M)
         -r        reverse complement
         -A        force FASTA output (discard quality)
         -C        drop comments at the header lines
         -N        drop sequences containing ambiguous bases
         -1        output the 2n-1 reads only
         -2        output the 2n reads only
         -V        shift quality by '(-Q) - 33'
         -U        convert all bases to uppercases
         -S        strip of white spaces in sequences

-F CHAR fake FASTQ quality []

Make sure you have the current version.

ADD REPLY
0
Entering edit mode

Hi there. I am also trying to replace my 4th line quality scores with fake ones, but it does not work although I have the latest seqtk. Do you know what could be wrong or if there are other possible solutions? Cheers!

ADD REPLY
2
Entering edit mode
5.5 years ago
h.mon 35k

You can convert fasta to fastq with fake quality scores with reformat.sh from the BBMap / BBTools package.

But are you sure you need to do this? Can't QIIME process fasta files?

P. S.: what are "HTP sequences".

ADD COMMENT
0
Entering edit mode

I converted fastq file to fasta using the following command - seqtk seq -aQ64 sample1.fastq > sample1.fasta

but in output - i am getting other words besides A, T , G, C- for example - W & R

CACACWCAACCCAGGTATGCATGCACATGCACGTCCATCTGCACACTCAACCCAAGCATGTGCACACACRCACACTTGTACACACACACTCAACCCAAGCACATGTGCAGTT

Can anyone tell why I am getting this result and what is the interpretation of this type of result?

ADD REPLY
1
Entering edit mode

Please check the allowed characters for nucleotides according to IUPAC: https://www.bioinformatics.org/sms/iupac.html Interpretation is on you since only you know what these data are.

ADD REPLY

Login before adding your answer.

Traffic: 3107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6