Any idea about extracting bacteria related proteins(or protein clusters) from uniref90 database?
1
0
Entering edit mode
9 weeks ago
boaty ▴ 170

Hi guys,

I want to do a blastp to align some microbial DNA sequences(fastq) against uniref90 protein catalog, in order to answer some questions about protein abundance in my data. The thing is that uniref90 fasta file downloaded from UniProt website has amino acid sequences from all kind of organism including prokaryote and eukaryote. But I only want to focus on bacteria. The uniref90 has already provided a taxid for the lowest common taxon, this could be Genus, Family but they do not list all taxonomic ranks to tell that it is a bacteria.

Some is there any about how to do this extraction?

Thanks in advance

uniref uniProt • 416 views
ADD COMMENT
1
Entering edit mode

Try using taxonkit list to get all TaxIds belonging to bacteria (taxid: 2) and seqkit grep to filtering fasta file by taxid.

ADD REPLY
0
Entering edit mode

Thanks, I will try this

ADD REPLY
2
Entering edit mode
9 weeks ago
Mensur Dlakic ★ 14k

I don't think this is necessarily a great idea, but here goes. First you download all bacterial sequences from taxonomic divisions (sprot and trembl .dat files, convert them to fasta with esl-reformat from HMMer). Then you can use either cd-hit or MMseqs2 to trim them to 90% redundancy. It should not take too long for 90% but you will need a computer with a healthy amount of memory and disk space.

Yet another way - and also not great - would be to download all bacterial proteins as above, extract their IDs. Next download UniRef90, format it using makeblastdb and extract bacterial proteins with blastdbcmd and those IDs from the previous step. I think the first approach will be faster.

ADD COMMENT
0
Entering edit mode

Thanks I will try the second one, I am not sure that the in-house 90% identity file will be the same as uniref90

ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6