Question: Problem Of Using Subseq In Seqtk
0
gravatar for ymwur1
7.4 years ago by
ymwur10
ymwur10 wrote:

Hi, I try to extract sequences from a fastq file using the "subseq" in seqtk. But the extract file contains only the 1st sequence but no others. I am wondering whether my name.lst file does not fit with what seqtk needs. I have names of each sequence without other symbols each line in the name.lst. But the fastq file starts each sequence name with a @. Should I add @ in front of each sequence name? Or what other problem it can be?

Any suggestion is welcome. Thanks,

Chih-Ming

• 4.1k views
ADD COMMENTlink modified 3.2 years ago by ajaybabu270 • written 7.4 years ago by ymwur10
2
gravatar for RamRS
6.2 years ago by
RamRS27k
Houston, TX
RamRS27k wrote:

Looks like this reply might be coming in a bit too late, but here goes:

a. Like @Istvan says, the sequence with the ID might not exist in the FASTA file

b. The ID might contain a white space, in which case the characters after the white space are processed as sequence description and not as sequence ID.

c. Duplicate ID maybe?

ADD COMMENTlink written 6.2 years ago by RamRS27k
1
gravatar for Istvan Albert
7.4 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

Seems to work fine below (also test out fine with fastq files). Make sure that you are requesting the ids that actually exist.

ialbert@porthos ~
$ cat test.fa 
>x1
AAAAAA
>x2
TTTTTT
>x3
CCCCCC
>x4
GGGGGG

ialbert@porthos ~
$ cat name.list 
x2
x4


ialbert@porthos ~
$ seqtk subseq test.fa name.list 
>x2
TTTTTT
>x4
GGGGGG
ADD COMMENTlink written 7.4 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1488 users visited in the last hour