Creation of UniProt Pseudo-metagenome
2
0
Entering edit mode
6.0 years ago
Faust • 0

Hi,

I have been using the complete bacterial UniProt/trEMBL database for metaproteomics. Now we have been able to carry out 16S-rRNA sequencing on our samples, giving an exact idea of which species are in there. My question is, how does one select and combine sequences from UniProt? Is there any way of searching for all the sequences within a particular genus and then combining the results? I would like to create a pseudo-metagenome based on the 16s data.

rna-seq genome UniProt protein • 1.7k views
ADD COMMENT
1
Entering edit mode

Please consider adding tags "uniprot" and "protein" to your post.

ADD REPLY
0
Entering edit mode

how does one select and combine sequences from UniProt?

All sequences for a genus or specific genes?

ADD REPLY
0
Entering edit mode

All sequences for a genus. On UniProt I am able to select all sequences for ONE genus, but I want to combine all the sequences from the different genera flagged up by 16s. So, for instance, combining all Citrobacter sequences in a database with all Enterococcus sequences, etc etc

ADD REPLY
0
Entering edit mode
6.0 years ago
GenoMax 141k

You could download fasta format sequence files from the web search interface. Example query for Citrobacter. Click on Download (make sure download all is selected) and then save fasta file.

Alternatively you could download the entire uniprot data and extract sequences you need from it.

ADD COMMENT
0
Entering edit mode

Thanks. Please bear with me as I am quite new to metaproteomics so I am very much a novice. If using the first approach, is there a way to combine the FASTA files produced for each genus search in to one file? Or how do you extract the desired sequences from the complete UniProt database? I apologise if these are obvious questions.

ADD REPLY
1
Entering edit mode

You can concatenate the data files together by doing cat citrobacter.fa enterococcus.fa third_genus.fa ..... > total.fa.

Extracting sequences from full database will be a little more involved.

ADD REPLY
0
Entering edit mode

Thank you, this is very helpful!

ADD REPLY
0
Entering edit mode
5.9 years ago

Alternatively, you can build an advanced query using the boolean operator "or", e.g.

taxonomy:"Enterococcus [1350]" OR taxonomy:"Citrobacter [544]"

http://www.uniprot.org/uniprot/?query=taxonomy%3A%22Enterococcus+%5B1350%5D%22+OR+taxonomy%3A%22Citrobacter+%5B544%5D%22&sort=score

ADD COMMENT

Login before adding your answer.

Traffic: 1373 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6