Question

fasta to fastq without quality scores

0

Entering edit mode

5.5 years ago

inayatkhan8185 • 0

is it possible to convert fasta to fastq format without quality scores? if not how one can get quality scores of fasta sequences already in the genebank? i am retrieving sequences of clone libraries which are longer than HTP sequences and available only in fasta format. i have to process these files in QIIME pipeline

sequence next-gen sequencing rna-seq • 11k views

ADD COMMENT • link updated 5.0 years ago by christian.rinke • 0 • written 5.5 years ago by inayatkhan8185 • 0

score 3 · Accepted Answer · 2018-10-19

3

Entering edit mode

5.5 years ago

ATpoint 82k

Seqtk can do this, here using # as the fake quality score:

seqtk seq -F '#' in.fa > out.fq

ADD COMMENT • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

Seem seqtk seq has no -F option:

seqtk seq -F seq: invalid option -- 'F'

Usage: seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT mask bases with quality lower than INT [0]

     -X INT    mask bases with quality higher than INT [255]
     -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
     -l INT    number of residues per line; 0 for 2^32-1 [0]
     -Q INT    quality shift: ASCII-INT gives base quality [33]
     -s INT    random seed (effective with -f) [11]
     -f FLOAT  sample FLOAT fraction of sequences [1]
     -M FILE   mask regions in BED or name list FILE [null]
     -L INT    drop sequences with length shorter than INT [0]
     -c        mask complement region (effective with -M)
     -r        reverse complement
     -A        force FASTA output (discard quality)
     -C        drop comments at the header lines
     -N        drop sequences containing ambiguous bases
     -1        output the 2n-1 reads only
     -2        output the 2n reads only
     -V        shift quality by '(-Q) - 33'
     -U        convert all bases to uppercases
     -S        strip of white spaces in sequences

ADD REPLY • link 5.0 years ago by christian.rinke • 0

0

Entering edit mode

Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
         -Q INT    quality shift: ASCII-INT gives base quality [33]
         -s INT    random seed (effective with -f) [11]
         -f FLOAT  sample FLOAT fraction of sequences [1]
         -M FILE   mask regions in BED or name list FILE [null]
         -L INT    drop sequences with length shorter than INT [0]
         -F CHAR   fake FASTQ quality []
         -c        mask complement region (effective with -M)
         -r        reverse complement
         -A        force FASTA output (discard quality)
         -C        drop comments at the header lines
         -N        drop sequences containing ambiguous bases
         -1        output the 2n-1 reads only
         -2        output the 2n reads only
         -V        shift quality by '(-Q) - 33'
         -U        convert all bases to uppercases
         -S        strip of white spaces in sequences

-F CHAR fake FASTQ quality []

Make sure you have the current version.

ADD REPLY • link 5.0 years ago by ATpoint 82k

0

Entering edit mode

Hi there. I am also trying to replace my 4th line quality scores with fake ones, but it does not work although I have the latest seqtk. Do you know what could be wrong or if there are other possible solutions? Cheers!

ADD REPLY • link 4.4 years ago by zach ▴ 10

score 2 · Accepted Answer · 2018-10-18

2

Entering edit mode

5.5 years ago

h.mon 35k

You can convert fasta to fastq with fake quality scores with reformat.sh from the BBMap / BBTools package.

But are you sure you need to do this? Can't QIIME process fasta files?

P. S.: what are "HTP sequences".

ADD COMMENT • link 5.5 years ago by h.mon 35k

0

Entering edit mode

I converted fastq file to fasta using the following command - seqtk seq -aQ64 sample1.fastq > sample1.fasta

but in output - i am getting other words besides A, T , G, C- for example - W & R

CACACWCAACCCAGGTATGCATGCACATGCACGTCCATCTGCACACTCAACCCAAGCATGTGCACACACRCACACTTGTACACACACACTCAACCCAAGCACATGTGCAGTT

Can anyone tell why I am getting this result and what is the interpretation of this type of result?

ADD REPLY • link 3.3 years ago by archanaverma433 ▴ 10

1

Entering edit mode

Please check the allowed characters for nucleotides according to IUPAC: https://www.bioinformatics.org/sms/iupac.html Interpretation is on you since only you know what these data are.

ADD REPLY • link 3.3 years ago by ATpoint 82k