Match predicted sequences to reference genome to generate data for annotation GTF

0

Entering edit mode

3.8 years ago

joelepaul • 0

Hi @ll!

From a paper, I have obtained a list of ~10 predicted sequences for specific proteins (predicted from transcriptomes). Now I would like to match these sequences with the reference genome so as to reveal annotation data (not only "position" but also scaffold ID, score, strand and frame - see https://www.ensembl.org/info/website/upload/gff.html ) that i can use to extend my already present annotation GTF file on this species. However, I do not know of any software that could be used for that purpose. Could someone here point me into the correct direction? I would like to add that I am completely unfamiliar with python so "writing a custom python script" is not an option for me unfortunately.

Thank you for your help!

Joe

annotation • 652 views

ADD COMMENT • link 3.8 years ago by joelepaul • 0

0

Entering edit mode

predicted sequences for specific proteins

You can use blast+ or blat to align those sequences back to the reference genome. If the genomes are available at NCBI/Ensembl you can do this using the appropriate web interface for blast. If not, you will need the do the search locally.

ADD REPLY • link 3.8 years ago by GenoMax 141k

0

Entering edit mode

Looks like GeMoMa could work http://www.jstacs.de/index.php/GeMoMa . It takes in the protein sequence.

ADD REPLY • link 3.8 years ago by microfuge ★ 1.9k

Login before adding your answer.