Question: Retrieve multi-line fasta sequences using list of locus tag
0
gravatar for sharmatina189059
22 months ago by
United States
sharmatina18905940 wrote:

Dear all Can someone tell me how to retrieve multi-line fasta file from for instance below mentioned data using a list of locus_tag like locus_tag=ABUW_RS00010, locus_tag=ABUW_RS00015. I have 400 entries of this.

>lcl|NZ_CP008706.1_cds_WP_001237350.1_2 [locus_tag=ABUW_RS00010] [protein=DNA polymerase III subunit beta] [protein_id=WP_001237350.1] [location=1590..2738] [gbkey=CDS]
GTGCGTTTGAAAATCGCTAAAGAAAGTTTACTCAATGTTTTATCGCATGTGGTCGGTGCGGTAGAACGTC
GTCATACCTTGAATATTCTTTCAAACGTCAAAATTCAGACTAATGCTCAAGCGCTCACTATTACGGGTTC
>lcl|NZ_CP008706.1_cds_WP_000550807.1_3 [locus_tag=ABUW_RS00015] [protein=DNA replication/repair protein RecF] [protein_id=WP_000550807.1] [location=2753..3835] [gbkey=CDS]
ATGCATCTTACGCGCTTAAATATTGAACGTGTGCGTAATTTAAAAACGGTTGCACTCCATGGGTTGCAAC
CGTTTAATGTCTTTTATGGCGCAAACGGTTCGGGTAAAACATCAATTTTAGAAGCGATCCATTTGCTTGC
AACAGGTCGCTCATTCCGTACACATATACCCAAAAATTATATTCAATATGAAGCTGACGACGCGATTGTA
TTTGCTCAGTCAGCAACTGAAAAAATTGGTATGCAAAAACTCGCTTCTGGTGAGCAACTCATGAAAGTGA
sequence • 792 views
ADD COMMENTlink modified 22 months ago by Hugo230 • written 22 months ago by sharmatina18905940
1

Hello sharmatina189059,

could you please reformat your post using the code button (the one with 101010) so that one can see how your input file realy looks like.

Does it look like this?

>lcl|NZ_CP008706.1_cds_WP_001237350.1_2 [locus_tag=ABUW_RS00010] [protein=DNA polymerase III subunit beta] [protein_id=WP_001237350.1] [location=1590..2738] [gbkey=CDS]
GTGCGTTTGAAAATCGCTAAAGAAAGTTTACTCAATGTTTTATCGCATGTGGTCGGTGCGGTAGAACGTC
GTCATACCTTGAATATTCTTTCAAACGTCAAAATTCAGACTAATGCTCAAGCGCTCACTATTACGGGTTC
>lcl|NZ_CP008706.1_cds_WP_000550807.1_3 [locus_tag=ABUW_RS00015] [protein=DNA replication/repair protein RecF] [protein_id=WP_000550807.1] [location=2753..3835] [gbkey=CDS] 
ATGCATCTTACGCGCTTAAATATTGAACGTGTGCGTAATTTAAAAACGGTTGCACTCCATGGGTTGCAAC
CGTTTAATGTCTTTTATGGCGCAAACGGTTCGGGTAAAACATCAATTTTAGAAGCGATCCATTTGCTTGC
AACAGGTCGCTCATTCCGTACACATATACCCAAAAATTATATTCAATATGAAGCTGACGACGCGATTGTA
TTTGCTCAGTCAGCAACTGAAAAAATTGGTATGCAAAAACTCGCTTCTGGTGAGCAACTCATGAAAGTGA

fin swimmer

ADD REPLYlink modified 22 months ago • written 22 months ago by finswimmer13k

retrieve from where?

ADD REPLYlink modified 22 months ago • written 22 months ago by cpad011212k

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

code_formatting

ADD REPLYlink modified 5 months ago by finswimmer13k • written 22 months ago by WouterDeCoster43k
4
gravatar for finswimmer
22 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello cpad0112,

I understood the question in that way, that the OP wants to fetch a limited list of sequences out of one fasta file containing many sequences, based on the locus_tag. But maybe I'm wrong here.

If I'm right, than seqkit could help (again):

$ seqkit grep -n -r -f tags.txt input.fasta > output.fasta

Where tags.txt contains the locus_tags.

fin swimmer

ADD COMMENTlink modified 22 months ago • written 22 months ago by finswimmer13k

I guess that is what OP wants. Retrieve by locus tag.

ADD REPLYlink written 22 months ago by cpad011212k

Thankyou so much!!! It works. This is exactly I want.

ADD REPLYlink written 21 months ago by sharmatina18905940

I have moved the comment of finswimmer to an answer since it solves your question.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 21 months ago by WouterDeCoster43k
0
gravatar for Hugo
22 months ago by
Hugo230
Universidade de Vigo, Ourense (Spain)
Hugo230 wrote:

Hello sharmatina189059,

for this type of operations you can use SEDA (http://www.sing-group.org/seda/).

Specifically, the "Pattern filtering" operation (http://www.sing-group.org/seda/manual/operations.html#pattern-filtering) allows to do what you are asking for. You only need to load your FASTA file(s) (or take the sequences from the clipboard) and then apply your text pattern to the sequence headers.

Please, do not hesitate to contact us in case you need further assistant.

Regards,

Hugo.

ADD COMMENTlink written 22 months ago by Hugo230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1323 users visited in the last hour