How to map predicted proteins to contigs?
1
0
Entering edit mode
16 months ago
Paula ▴ 60

Hi All!

I am looking for functional genes in a metagenomic dataset. I took the contigs assembled with metaSPADES and did protein prediction using Prodigal. Then, I used HMMER to search for the predicted proteins for the gene mcrA. Now I want to do tree placement, and the tree is based on DNA sequences. How can I search for the proteins annotated as mcrA in my contigs?

Thank you!

metaSPADES prodigal HMMER • 652 views
ADD COMMENT
2
Entering edit mode
16 months ago
Mensur Dlakic ★ 27k

Let's say your metagenome assembly is in meta.fasta:

prodigal -i meta.fasta -a meta.faa -d meta.fna -o /dev/null -p meta

That command predicts proteins in meta.faa and matching genes in meta.fna, which are linked by a common header line. For proteins it may look like this:

>k141_164538_3 # 343 # 567 # 1 # ID=263_3;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.364
MGRTKKVGTAGRFGSKYGKKIREKVAEIEKIEKQRHICPNCKMRYLVREGTGIWVCKKCG
AKFAGQAYYPPRVS

That particular protein was translated from the third predicted gene in contig k141_164538 (thus k141_164538_3), it is between bases 343-567, in first reading frame. It is complete (that's what partial=00 means), starts with ATG, has a regular ribosome binding motif in front of it, and a G/C content of 0.364. If you identified that protein as your match, you just go to meta.fna and search for k141_164538_3 and that will be the gene you want.

ADD COMMENT
1
Entering edit mode

Thank you, Mensur! This is incredibly helpful. Merry Christmas and Happy New Year!

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6