Match peptides to protein sequences and tell the position in R
1
0
Entering edit mode
9.1 years ago
Marie • 0

Hello,

I was wondering if there is a convenient way to figure out the sequence coverage of a given protein with a list of peptides.

For example I have this protein:

>sp|O00330|ODPX_HUMAN Pyruvate dehydrogenase protein X component, mitochondrial OS=Homo sapiens GN=PDHX PE=1 SV=3
MAASWRLGCDPRLLRYLVGFPGRRSVGLVKGALGWSVSRGANWRWFHSTQWLRGDPIKIL
MPSLSPTMEEGNIVKWLKKEGEAVSAGDALCEIETDKAVVTLDASDDGILAKIVVEEGSK
NIRLGSLIGLIVEEGEDWKHVEIPKDVGPPPPVSKPSEPRPSPEPQISIPVKKEHIPGTL
RFRLSPAARNILEKHSLDASQGTATGPRGIFTKEDALKLVQLKQTGKITESRPTPAPTAT
PTAPSPLQATAGPSYPRPVIPPVSTPGQPNAVGTFTEIPASNIRRVIAKRLTESKSTVPH
AYATADCDLGAVLKVRQDLVKDDIKVSVNDFIIKAAAVTLKQMPDVNVSWDGEGPKQLPF
IDISVAVATDKGLLTPIIKDAAAKGIQEIADSVKALSKKARDGKLLPEEYQGGSFSISNL
GMFGIDEFTAVINPPQACILAVGRFRPVLKLTEDEEGNAKLQQRQLITVTMSSDSRVVDD
ELATRFLKSFKANLENPIRLA

And the following peptides:

HSLDASQGTATGPR
STVPHAYATADCDLGAVLK
VVDDELATR

Is it possible that R tells me the start and end of each peptide in the protein of interest in a new file?

Is it also possible to get the fasta sequence directly from uniprot?

I need to do that for many proteins and sequences so I cant do that manually.

Thanks a lot!

R sequence • 5.5k views
ADD COMMENT
0
Entering edit mode

Hello,

I have peptide 30,000 peptide sequences from human brain sample. I want to compare these peptides with the PRIDE database peptide sequence to see whether my peptide sequence is novel or not. I am facing two problems.

Firstly, I have to download the pride dataset in Linux server because my computer doesn't support to download these huge datasets. secondly, how I can compare these in R. Please give me some suggestions.

Shanzida.

ADD REPLY
2
Entering edit mode
9.1 years ago

The Biostrings package has many facilities for pattern/string matching. The Multiple Alignments and Pairwise Sequence Alignments vignettes would be useful to read through as a starting point.

As for getting sequences straight from UniProt, the answer to that is also yes. The UniProt website has a section on accessing its resources programmatically, which you should read through.

There are many ways to interact with a webservice in R. I recently used their webservice to do ID cross-referencing, and utilized the httr package to do so.

Since you want to fetch FASTA/peptide sequences, your queries won't look like this, but the code tidbit below that maps a series of entrez ids to UniProt accession numbers should help to get you started:

library(httr)
entrez <- c('51692', '1478', '26986')
params <- list(
               from='P_ENTREZGENEID',
               to='ACC', 
               format='tab', 
               query=paste(entrez, collapse=' '))
response <- POST('http://www.uniprot.org/mapping/', 
                 body=params, 
                 encode='multipart')
result <- read.table(textConnection(content(response, 'text')),
                     stringsAsFactors=FALSE,
                     header=TRUE)

and result is

    From         To
1| 51692     G5E9W3
2| 51692     Q9UKF6
3|  1478     P33240
4| 26986 A0A024R9C1
5| 26986     P11940
ADD COMMENT
0
Entering edit mode

Thank you so much for your answer.

ADD REPLY

Login before adding your answer.

Traffic: 3434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6