Question: Mapping peptide to the source genomic region
gravatar for genie66
5.9 years ago by
United States
genie6620 wrote:

I have a list of peptide sequences, their respective protein names, their start and end co-ordinates in their protein sequences. Now I wanted to map them back to genomic source and get the genomic start and end co-ordinates(preferably exons) . I have tried several tools like proteogenomic mapping tools but no luck. Peptide atlas could able to provide the exonic co-ordinates but only one peptide is possible at a time, I have hundreds of peptides! Is there is any other way to do this! Please help me out! Thanks!

peptide mapping • 2.6k views
ADD COMMENTlink modified 4.2 years ago by microbe7730 • written 5.9 years ago by genie6620
gravatar for microbe77
4.2 years ago by
microbe7730 wrote:

Might be too late, but this is how to do it! 1. make a six frame peptide library from you genome (all possible peptides), I use 10 aa +, for 4.5M bp bacterium about 0.25M peptides 2. use this as a reference to get all peptides that map to your possible peptides 3. Get a fasta file that contains all the genome nucleotide sequence (this should be one entry fastafile that contains ALL nucleotides 4. make a nucleotide blast database using makeblastdb command from local blast installation 5. align your peptides to the genome database using tblastn: tblastn -query <your peptide="" fasta="" file=""> -db <your genome="" database="" (these="" are="" three="" files,="" just="" use="" name="" without="" extension)="" -out="" <name="" of="" the="" out="" file="" you="" want=""> -outfmt 6 (the -outfmt 6 will give you tabular results) -max_target_seqs <1 or more, use 1> (not sure about this option though double check!) -evalue 0.001 (to eliminate partial alignment)

  1. open file in excel and only keep genome name (useually NC_xxxx), start, stop. Save this file as .bed which will be readable in almost all genome browsers (I use IGB)
  2. The code that makes six frames is in python. I will paste the code hereunder:

better to find the code here:

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by microbe7730
gravatar for raunakms
5.9 years ago by
San Francisco
raunakms1.1k wrote:

using tools like tBLASTn could be a good starting point where it compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).

ADD COMMENTlink written 5.9 years ago by raunakms1.1k
gravatar for Siva
5.9 years ago by
United States
Siva1.7k wrote:

You could try Scipio which uses blat to search a query protein sequence against its genome. It outputs the intron/exon boundaries and splice sites.

ADD COMMENTlink written 5.9 years ago by Siva1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2362 users visited in the last hour