I have a list of peptide sequences, their respective protein names, their start and end co-ordinates in their protein sequences. Now I wanted to map them back to genomic source and get the genomic start and end co-ordinates(preferably exons) . I have tried several tools like proteogenomic mapping tools but no luck. Peptide atlas could able to provide the exonic co-ordinates but only one peptide is possible at a time, I have hundreds of peptides! Is there is any other way to do this! Please help me out! Thanks!
Might be too late, but this is how to do it! 1. make a six frame peptide library from you genome (all possible peptides), I use 10 aa +, for 4.5M bp bacterium about 0.25M peptides 2. use this as a reference to get all peptides that map to your possible peptides 3. Get a fasta file that contains all the genome nucleotide sequence (this should be one entry fastafile that contains ALL nucleotides 4. make a nucleotide blast database using makeblastdb command from local blast installation 5. align your peptides to the genome database using tblastn: tblastn -query <your peptide="" fasta="" file=""> -db <your genome="" database="" (these="" are="" three="" files,="" just="" use="" name="" without="" extension)="" -out="" <name="" of="" the="" out="" file="" you="" want=""> -outfmt 6 (the -outfmt 6 will give you tabular results) -max_target_seqs <1 or more, use 1> (not sure about this option though double check!) -evalue 0.001 (to eliminate partial alignment)
- open file in excel and only keep genome name (useually NC_xxxx), start, stop. Save this file as .bed which will be readable in almost all genome browsers (I use IGB)
- The code that makes six frames is in python. I will paste the code hereunder:
better to find the code here: https://github.com/microbe777/fasta2six_frames
using tools like tBLASTn could be a good starting point where it compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).