Retrieve multi-line fasta sequences using list of locus tag
2
0
Entering edit mode
3.4 years ago

Dear all Can someone tell me how to retrieve multi-line fasta file from for instance below mentioned data using a list of locus_tag like locus_tag=ABUW_RS00010, locus_tag=ABUW_RS00015. I have 400 entries of this.

>lcl|NZ_CP008706.1_cds_WP_001237350.1_2 [locus_tag=ABUW_RS00010] [protein=DNA polymerase III subunit beta] [protein_id=WP_001237350.1] [location=1590..2738] [gbkey=CDS]
GTGCGTTTGAAAATCGCTAAAGAAAGTTTACTCAATGTTTTATCGCATGTGGTCGGTGCGGTAGAACGTC
GTCATACCTTGAATATTCTTTCAAACGTCAAAATTCAGACTAATGCTCAAGCGCTCACTATTACGGGTTC
>lcl|NZ_CP008706.1_cds_WP_000550807.1_3 [locus_tag=ABUW_RS00015] [protein=DNA replication/repair protein RecF] [protein_id=WP_000550807.1] [location=2753..3835] [gbkey=CDS]
ATGCATCTTACGCGCTTAAATATTGAACGTGTGCGTAATTTAAAAACGGTTGCACTCCATGGGTTGCAAC
CGTTTAATGTCTTTTATGGCGCAAACGGTTCGGGTAAAACATCAATTTTAGAAGCGATCCATTTGCTTGC
AACAGGTCGCTCATTCCGTACACATATACCCAAAAATTATATTCAATATGAAGCTGACGACGCGATTGTA
TTTGCTCAGTCAGCAACTGAAAAAATTGGTATGCAAAAACTCGCTTCTGGTGAGCAACTCATGAAAGTGA
sequence • 1.9k views
ADD COMMENT
1
Entering edit mode

Hello sharmatina189059,

could you please reformat your post using the code button (the one with 101010) so that one can see how your input file realy looks like.

Does it look like this?

>lcl|NZ_CP008706.1_cds_WP_001237350.1_2 [locus_tag=ABUW_RS00010] [protein=DNA polymerase III subunit beta] [protein_id=WP_001237350.1] [location=1590..2738] [gbkey=CDS]
GTGCGTTTGAAAATCGCTAAAGAAAGTTTACTCAATGTTTTATCGCATGTGGTCGGTGCGGTAGAACGTC
GTCATACCTTGAATATTCTTTCAAACGTCAAAATTCAGACTAATGCTCAAGCGCTCACTATTACGGGTTC
>lcl|NZ_CP008706.1_cds_WP_000550807.1_3 [locus_tag=ABUW_RS00015] [protein=DNA replication/repair protein RecF] [protein_id=WP_000550807.1] [location=2753..3835] [gbkey=CDS] 
ATGCATCTTACGCGCTTAAATATTGAACGTGTGCGTAATTTAAAAACGGTTGCACTCCATGGGTTGCAAC
CGTTTAATGTCTTTTATGGCGCAAACGGTTCGGGTAAAACATCAATTTTAGAAGCGATCCATTTGCTTGC
AACAGGTCGCTCATTCCGTACACATATACCCAAAAATTATATTCAATATGAAGCTGACGACGCGATTGTA
TTTGCTCAGTCAGCAACTGAAAAAATTGGTATGCAAAAACTCGCTTCTGGTGAGCAACTCATGAAAGTGA

fin swimmer

ADD REPLY
0
Entering edit mode

retrieve from where?

ADD REPLY
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

code_formatting

ADD REPLY
5
Entering edit mode
3.4 years ago

Hello cpad0112,

I understood the question in that way, that the OP wants to fetch a limited list of sequences out of one fasta file containing many sequences, based on the locus_tag. But maybe I'm wrong here.

If I'm right, than seqkit could help (again):

$ seqkit grep -n -r -f tags.txt input.fasta > output.fasta

Where tags.txt contains the locus_tags.

fin swimmer

ADD COMMENT
0
Entering edit mode

I guess that is what OP wants. Retrieve by locus tag.

ADD REPLY
0
Entering edit mode

Thankyou so much!!! It works. This is exactly I want.

ADD REPLY
0
Entering edit mode

I have moved the comment of finswimmer to an answer since it solves your question.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Hello Finswimmer, can your answer above apply to extracting the sequences from the genbank file using the locus tags?

ADD REPLY
0
Entering edit mode
  1. Please post as separate query
  2. Post example input file, function/code you have used and expected output
  3. Format example input, code and output , to make it understandable by others.
ADD REPLY
0
Entering edit mode
3.4 years ago
Hugo ▴ 340

Hello sharmatina189059,

for this type of operations you can use SEDA (http://www.sing-group.org/seda/).

Specifically, the "Pattern filtering" operation (http://www.sing-group.org/seda/manual/operations.html#pattern-filtering) allows to do what you are asking for. You only need to load your FASTA file(s) (or take the sequences from the clipboard) and then apply your text pattern to the sequence headers.

Please, do not hesitate to contact us in case you need further assistant.

Regards,

Hugo.

ADD COMMENT

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6