Question: Extracting a protein sequence from fasta file
0
gravatar for h.l.wong
3.2 years ago by
h.l.wong60
Australia
h.l.wong60 wrote:

Hi all,

I am currently working on metagenomics and I would like to know if there's any way that I can extract a protein sequence from a fasta file? Thanks.

Cheers

Alan

sequence gene • 1.5k views
ADD COMMENTlink modified 3.1 years ago by Biostar ♦♦ 20 • written 3.2 years ago by h.l.wong60
1

In case you know the sequence name, use: https://github.com/lh3/bioawk

SEQNAME="<insert sequence name>"; bioawk -v x=$SEQNAME -c fastx '{if ($name==x) {print ">"$name"\n"$seq}}' main.fa > selected.fa

This will select the sequence you want from the main.fa file and print it to the selected.fa file.

ADD REPLYlink written 3.2 years ago by Macspider3.0k

What kind of fasta file (single, multi-fasta, DNA. protein)?

ADD REPLYlink written 3.2 years ago by genomax85k

Thanks, I have a metagenomics data, do I extract the sequences from the assembled contigs file? Or do I need to extract the sequence out from other files?

And if I have an annotated protein (in KEGG), how can I get the nucleic acid sequence?

Thanks

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by h.l.wong60

prodigal or genmarks works fine and fast.

ADD REPLYlink written 3.2 years ago by Buffo1.8k

Thanks, should I use prodigal on the assemble contigs file to extract the sequences?

ADD REPLYlink written 3.2 years ago by h.l.wong60

yes, read the manual, but yes, it is possible (nucleotide, protein or both), take care about translation table that you use.

http://prodigal.ornl.gov/
ADD REPLYlink written 3.2 years ago by Buffo1.8k

It helps if you elucidate your question. Do you mean extract sequences based on header like this?

ADD REPLYlink written 3.2 years ago by Rohit1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1186 users visited in the last hour