Problem Of Using Subseq In Seqtk
4
0
Entering edit mode
11.2 years ago
ymwur1 ▴ 10

Hi, I try to extract sequences from a fastq file using the "subseq" in seqtk. But the extract file contains only the 1st sequence but no others. I am wondering whether my name.lst file does not fit with what seqtk needs. I have names of each sequence without other symbols each line in the name.lst. But the fastq file starts each sequence name with a @. Should I add @ in front of each sequence name? Or what other problem it can be?

Any suggestion is welcome. Thanks,

Chih-Ming

• 6.7k views
ADD COMMENT
2
Entering edit mode
10.0 years ago
Ram 43k

Looks like this reply might be coming in a bit too late, but here goes:

a. Like @Istvan says, the sequence with the ID might not exist in the FASTA file

b. The ID might contain a white space, in which case the characters after the white space are processed as sequence description and not as sequence ID.

c. Duplicate ID maybe?

ADD COMMENT
1
Entering edit mode
11.2 years ago

Seems to work fine below (also test out fine with fastq files). Make sure that you are requesting the ids that actually exist.

ialbert@porthos ~
$ cat test.fa 
>x1
AAAAAA
>x2
TTTTTT
>x3
CCCCCC
>x4
GGGGGG

ialbert@porthos ~
$ cat name.list 
x2
x4


ialbert@porthos ~
$ seqtk subseq test.fa name.list 
>x2
TTTTTT
>x4
GGGGGG
ADD COMMENT

Login before adding your answer.

Traffic: 3009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6