Question: Retrieve multi-line fasta sequences using list of locus tag
0
gravatar for sharmatina189059
8 months ago by
United States
sharmatina18905930 wrote:

Dear all Can someone tell me how to retrieve multi-line fasta file from for instance below mentioned data using a list of locus_tag like locus_tag=ABUW_RS00010, locus_tag=ABUW_RS00015. I have 400 entries of this.

>lcl|NZ_CP008706.1_cds_WP_001237350.1_2 [locus_tag=ABUW_RS00010] [protein=DNA polymerase III subunit beta] [protein_id=WP_001237350.1] [location=1590..2738] [gbkey=CDS]
GTGCGTTTGAAAATCGCTAAAGAAAGTTTACTCAATGTTTTATCGCATGTGGTCGGTGCGGTAGAACGTC
GTCATACCTTGAATATTCTTTCAAACGTCAAAATTCAGACTAATGCTCAAGCGCTCACTATTACGGGTTC
>lcl|NZ_CP008706.1_cds_WP_000550807.1_3 [locus_tag=ABUW_RS00015] [protein=DNA replication/repair protein RecF] [protein_id=WP_000550807.1] [location=2753..3835] [gbkey=CDS]
ATGCATCTTACGCGCTTAAATATTGAACGTGTGCGTAATTTAAAAACGGTTGCACTCCATGGGTTGCAAC
CGTTTAATGTCTTTTATGGCGCAAACGGTTCGGGTAAAACATCAATTTTAGAAGCGATCCATTTGCTTGC
AACAGGTCGCTCATTCCGTACACATATACCCAAAAATTATATTCAATATGAAGCTGACGACGCGATTGTA
TTTGCTCAGTCAGCAACTGAAAAAATTGGTATGCAAAAACTCGCTTCTGGTGAGCAACTCATGAAAGTGA
sequence • 327 views
ADD COMMENTlink modified 8 months ago by Hugo150 • written 8 months ago by sharmatina18905930
1

Hello sharmatina189059,

could you please reformat your post using the code button (the one with 101010) so that one can see how your input file realy looks like.

Does it look like this?

>lcl|NZ_CP008706.1_cds_WP_001237350.1_2 [locus_tag=ABUW_RS00010] [protein=DNA polymerase III subunit beta] [protein_id=WP_001237350.1] [location=1590..2738] [gbkey=CDS]
GTGCGTTTGAAAATCGCTAAAGAAAGTTTACTCAATGTTTTATCGCATGTGGTCGGTGCGGTAGAACGTC
GTCATACCTTGAATATTCTTTCAAACGTCAAAATTCAGACTAATGCTCAAGCGCTCACTATTACGGGTTC
>lcl|NZ_CP008706.1_cds_WP_000550807.1_3 [locus_tag=ABUW_RS00015] [protein=DNA replication/repair protein RecF] [protein_id=WP_000550807.1] [location=2753..3835] [gbkey=CDS] 
ATGCATCTTACGCGCTTAAATATTGAACGTGTGCGTAATTTAAAAACGGTTGCACTCCATGGGTTGCAAC
CGTTTAATGTCTTTTATGGCGCAAACGGTTCGGGTAAAACATCAATTTTAGAAGCGATCCATTTGCTTGC
AACAGGTCGCTCATTCCGTACACATATACCCAAAAATTATATTCAATATGAAGCTGACGACGCGATTGTA
TTTGCTCAGTCAGCAACTGAAAAAATTGGTATGCAAAAACTCGCTTCTGGTGAGCAACTCATGAAAGTGA

fin swimmer

ADD REPLYlink modified 8 months ago • written 8 months ago by finswimmer9.9k

retrieve from where?

ADD REPLYlink modified 8 months ago • written 8 months ago by cpad011211k

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 8 months ago by WouterDeCoster36k
3
gravatar for finswimmer
8 months ago by
finswimmer9.9k
Germany
finswimmer9.9k wrote:

Hello cpad0112,

I understood the question in that way, that the OP wants to fetch a limited list of sequences out of one fasta file containing many sequences, based on the locus_tag. But maybe I'm wrong here.

If I'm right, than seqkit could help (again):

$ seqkit grep -n -r -f tags.txt input.fasta > output.fasta

Where tags.txt contains the locus_tags.

fin swimmer

ADD COMMENTlink modified 8 months ago • written 8 months ago by finswimmer9.9k

I guess that is what OP wants. Retrieve by locus tag.

ADD REPLYlink written 8 months ago by cpad011211k

Thankyou so much!!! It works. This is exactly I want.

ADD REPLYlink written 8 months ago by sharmatina18905930

I have moved the comment of finswimmer to an answer since it solves your question.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 8 months ago by WouterDeCoster36k
0
gravatar for Hugo
8 months ago by
Hugo150
Universidade de Vigo, Ourense (Spain)
Hugo150 wrote:

Hello sharmatina189059,

for this type of operations you can use SEDA (http://www.sing-group.org/seda/).

Specifically, the "Pattern filtering" operation (http://www.sing-group.org/seda/manual/operations.html#pattern-filtering) allows to do what you are asking for. You only need to load your FASTA file(s) (or take the sequences from the clipboard) and then apply your text pattern to the sequence headers.

Please, do not hesitate to contact us in case you need further assistant.

Regards,

Hugo.

ADD COMMENTlink written 8 months ago by Hugo150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1325 users visited in the last hour