Question: Match peptides to protein sequences and tell the position in R
0
gravatar for Marie
5.6 years ago by
Marie0
Germany
Marie0 wrote:

Hello,

I was wondering if there is a convenient way to figure out the sequence coverage of a given protein with a list of peptides.

For example I have this protein:

>sp|O00330|ODPX_HUMAN Pyruvate dehydrogenase protein X component, mitochondrial OS=Homo sapiens GN=PDHX PE=1 SV=3
MAASWRLGCDPRLLRYLVGFPGRRSVGLVKGALGWSVSRGANWRWFHSTQWLRGDPIKIL
MPSLSPTMEEGNIVKWLKKEGEAVSAGDALCEIETDKAVVTLDASDDGILAKIVVEEGSK
NIRLGSLIGLIVEEGEDWKHVEIPKDVGPPPPVSKPSEPRPSPEPQISIPVKKEHIPGTL
RFRLSPAARNILEKHSLDASQGTATGPRGIFTKEDALKLVQLKQTGKITESRPTPAPTAT
PTAPSPLQATAGPSYPRPVIPPVSTPGQPNAVGTFTEIPASNIRRVIAKRLTESKSTVPH
AYATADCDLGAVLKVRQDLVKDDIKVSVNDFIIKAAAVTLKQMPDVNVSWDGEGPKQLPF
IDISVAVATDKGLLTPIIKDAAAKGIQEIADSVKALSKKARDGKLLPEEYQGGSFSISNL
GMFGIDEFTAVINPPQACILAVGRFRPVLKLTEDEEGNAKLQQRQLITVTMSSDSRVVDD
ELATRFLKSFKANLENPIRLA

And the following peptides:

HSLDASQGTATGPR
STVPHAYATADCDLGAVLK
VVDDELATR

Is it possible that R tells me the start and end of each peptide in the protein of interest in a new file?

Is it also possible to get the fasta sequence directly from uniprot?

I need to do that for many proteins and sequences so I cant do that manually.

Thanks a lot!

sequence R • 3.6k views
ADD COMMENTlink modified 15 months ago by RamRS30k • written 5.6 years ago by Marie0

Hello,

I have peptide 30,000 peptide sequences from human brain sample. I want to compare these peptides with the PRIDE database peptide sequence to see whether my peptide sequence is novel or not. I am facing two problems.

Firstly, I have to download the pride dataset in Linux server because my computer doesn't support to download these huge datasets. secondly, how I can compare these in R. Please give me some suggestions.

Shanzida.

ADD REPLYlink written 15 months ago by jahanshanzida0
0
gravatar for Steve Lianoglou
5.6 years ago by
Steve Lianoglou5.1k
US
Steve Lianoglou5.1k wrote:

The Biostrings package has many facilities for pattern/string matching. The Multiple Alignments and Pairwise Sequence Alignments vignettes would be useful to read through as a starting point.

As for getting sequences straight from UniProt, the answer to that is also yes. The UniProt website has a section on accessing its resources programmatically, which you should read through.

There are many ways to interact with a webservice in R. I recently used their webservice to do ID cross-referencing, and utilized the httr package to do so.

Since you want to fetch FASTA/peptide sequences, your queries won't look like this, but the code tidbit below that maps a series of entrez ids to UniProt accession numbers should help to get you started:

library(httr)
entrez <- c('51692', '1478', '26986')
params <- list(
               from='P_ENTREZGENEID',
               to='ACC', 
               format='tab', 
               query=paste(entrez, collapse=' '))
response <- POST('http://www.uniprot.org/mapping/', 
                 body=params, 
                 encode='multipart')
result <- read.table(textConnection(content(response, 'text')),
                     stringsAsFactors=FALSE,
                     header=TRUE)

and result is

    From         To
1| 51692     G5E9W3
2| 51692     Q9UKF6
3|  1478     P33240
4| 26986 A0A024R9C1
5| 26986     P11940
ADD COMMENTlink modified 15 months ago by RamRS30k • written 5.6 years ago by Steve Lianoglou5.1k

Thank you so much for your answer.

ADD REPLYlink written 15 months ago by jahanshanzida0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1005 users visited in the last hour