Question

Problem Of Using Subseq In Seqtk

0

Entering edit mode

11.2 years ago

ymwur1 ▴ 10

Hi, I try to extract sequences from a fastq file using the "subseq" in seqtk. But the extract file contains only the 1st sequence but no others. I am wondering whether my name.lst file does not fit with what seqtk needs. I have names of each sequence without other symbols each line in the name.lst. But the fastq file starts each sequence name with a @. Should I add @ in front of each sequence name? Or what other problem it can be?

Any suggestion is welcome. Thanks,

Chih-Ming

• 6.7k views

ADD COMMENT • link updated 7.0 years ago by ajaybabu27 • 0 • written 11.2 years ago by ymwur1 ▴ 10

score 2 · Answer 1 · 2014-04-16

Looks like this reply might be coming in a bit too late, but here goes:

a. Like @Istvan says, the sequence with the ID might not exist in the FASTA file

b. The ID might contain a white space, in which case the characters after the white space are processed as sequence description and not as sequence ID.

c. Duplicate ID maybe?

score 1 · Answer 2 · 2013-02-06

Seems to work fine below (also test out fine with fastq files). Make sure that you are requesting the ids that actually exist.

ialbert@porthos ~
$ cat test.fa 
>x1
AAAAAA
>x2
TTTTTT
>x3
CCCCCC
>x4
GGGGGG

ialbert@porthos ~
$ cat name.list 
x2
x4


ialbert@porthos ~
$ seqtk subseq test.fa name.list 
>x2
TTTTTT
>x4
GGGGGG