Match predicted sequences to reference genome to generate data for annotation GTF
0
0
Entering edit mode
4.4 years ago
joelepaul • 0

Hi @ll!

From a paper, I have obtained a list of ~10 predicted sequences for specific proteins (predicted from transcriptomes). Now I would like to match these sequences with the reference genome so as to reveal annotation data (not only "position" but also scaffold ID, score, strand and frame - see https://www.ensembl.org/info/website/upload/gff.html ) that i can use to extend my already present annotation GTF file on this species. However, I do not know of any software that could be used for that purpose. Could someone here point me into the correct direction? I would like to add that I am completely unfamiliar with python so "writing a custom python script" is not an option for me unfortunately.

Thank you for your help!

Joe

annotation • 744 views
ADD COMMENT
0
Entering edit mode

predicted sequences for specific proteins

You can use blast+ or blat to align those sequences back to the reference genome. If the genomes are available at NCBI/Ensembl you can do this using the appropriate web interface for blast. If not, you will need the do the search locally.

ADD REPLY
0
Entering edit mode

Looks like GeMoMa could work http://www.jstacs.de/index.php/GeMoMa . It takes in the protein sequence.

ADD REPLY

Login before adding your answer.

Traffic: 1413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6