Question: Creation of UniProt Pseudo-metagenome
0
gravatar for Faust
7 weeks ago by
Faust0
Faust0 wrote:

Hi,

I have been using the complete bacterial UniProt/trEMBL database for metaproteomics. Now we have been able to carry out 16S-rRNA sequencing on our samples, giving an exact idea of which species are in there. My question is, how does one select and combine sequences from UniProt? Is there any way of searching for all the sequences within a particular genus and then combining the results? I would like to create a pseudo-metagenome based on the 16s data.

rna-seq uniprot protein genome • 188 views
ADD COMMENTlink modified 4 weeks ago by Elisabeth Gasteiger1.4k • written 7 weeks ago by Faust0
1

Please consider adding tags "uniprot" and "protein" to your post.

ADD REPLYlink written 7 weeks ago by Elisabeth Gasteiger1.4k

how does one select and combine sequences from UniProt?

All sequences for a genus or specific genes?

ADD REPLYlink written 7 weeks ago by genomax50k

All sequences for a genus. On UniProt I am able to select all sequences for ONE genus, but I want to combine all the sequences from the different genera flagged up by 16s. So, for instance, combining all Citrobacter sequences in a database with all Enterococcus sequences, etc etc

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Faust0
0
gravatar for genomax
7 weeks ago by
genomax50k
United States
genomax50k wrote:

You could download fasta format sequence files from the web search interface. Example query for Citrobacter. Click on Download (make sure download all is selected) and then save fasta file.

Alternatively you could download the entire uniprot data and extract sequences you need from it.

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by genomax50k

Thanks. Please bear with me as I am quite new to metaproteomics so I am very much a novice. If using the first approach, is there a way to combine the FASTA files produced for each genus search in to one file? Or how do you extract the desired sequences from the complete UniProt database? I apologise if these are obvious questions.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Faust0
1

You can concatenate the data files together by doing cat citrobacter.fa enterococcus.fa third_genus.fa ..... > total.fa.

Extracting sequences from full database will be a little more involved.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by genomax50k

Thank you, this is very helpful!

ADD REPLYlink written 7 weeks ago by Faust0
0
gravatar for Elisabeth Gasteiger
4 weeks ago by
Geneva
Elisabeth Gasteiger1.4k wrote:

Alternatively, you can build an advanced query using the boolean operator "or", e.g.

taxonomy:"Enterococcus [1350]" OR taxonomy:"Citrobacter [544]"

http://www.uniprot.org/uniprot/?query=taxonomy%3A%22Enterococcus+%5B1350%5D%22+OR+taxonomy%3A%22Citrobacter+%5B544%5D%22&sort=score

ADD COMMENTlink written 4 weeks ago by Elisabeth Gasteiger1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour