Hi all,
I have the following two datasets:
ECM Proteomics Dataset
ECM Isoform Dataset
I would like to merge the TGE amino acid sequence and the peptide sequence but these two dataset do not share a unique identifier for each row.
How can I please create a unique identifier for each row for each dataset so that I can correctly merge the right amino acid sequence to the correct peptide sequence?
For the ECM Proteomics Dataset, I did the following in Pandas. This created a unique ID based on all the columns
ECM['id'] = ECM.groupby(['Gene.Symbol','Division','Category','PI','Protein.Name..name.of.reference.protein.',
'Protein.description','Sequence..TGE.amino.acid.seq.']).ngroup()
How can I assign the same exact unique IDs to the same exact rows to the other dataset please?
If you can please give me examples in R or Python that would greatly be appreciated.
Kind Regards,
Ishack
Please show us what you've tried using R/python and we can help you get over any obstacles. We'll be unable to do your work for you, though.
Hi Ram,
Sorry about the incomplete post.
For the ECM Proteomics Dataset, I did the following in Pandas. This created a unique ID based on all the columns
How can I assign the same exact unique IDs to the same exact rows to the other dataset please?
I'm sorry, I don't know pandas. Maybe someone else that knows pandas can help you out. In the meanwhile, I'd recommend editing your post and adding the content from your comment in there.
Ok, can you show me please how to do it in R please?
Can anyone please help?
ishackm: you will gain much respect by going away for a few days and trying this on your own. Asking questions like "Can anyone please help?" seem somewhat desperate (?) Fair is fair - we have all been where you are right now.
Edit: Although I say 'on your own', there is more than enough material on the World Wide Web for you to search and, in that process, self learn.
I do apologise Kevin, its just that I have been trying to solve this problem for 3 days now, but got nowhere, hence thats why I asked this question.
All is fine. Do not worry.
So is there any library in R that can help me do this?
I want to merge every amino acid sequence with all the possible peptides related to that particular gene
Hey dude / dudette. I was working. If I had to do this in R, I would, first, find the reference data that I need outside R, input this to R, and then do the processing there.
These also look promising:
In fact, there seems to be 'Pep' this and 'pep' that... lots of programs. That's Pep-tastic!
I'm not sure I get it. Can you give an example of two lines (one from each file) that you would merge and tell us based on which information you would want to merge it? I am guessing, the protein ID will be of significance...?
For example, this is one isoform (ECM TGE)
I would like to merge same isoform with all the A2M peptides:
So result is like this please:
I would like something like this for every gene please.
same amino acid sequence compared with each peptide
How can I do that please?
Both datasets have
Gene.Symbol
. Can't you use that to extract column from other dataset ? My guess is both datasets have differing number of rows per Gene.Symbol, so you may not be exactly able to combine two datasets, but at least you will be able to separate info per gene.