I want to do a blastp to align some microbial DNA sequences(fastq) against uniref90 protein catalog, in order to answer some questions about protein abundance in my data. The thing is that uniref90 fasta file downloaded from UniProt website has amino acid sequences from all kind of organism including prokaryote and eukaryote. But I only want to focus on bacteria. The uniref90 has already provided a taxid for the lowest common taxon, this could be Genus, Family but they do not list all taxonomic ranks to tell that it is a bacteria.
Some is there any about how to do this extraction?
Thanks in advance
Try using taxonkit list to get all TaxIds belonging to bacteria (
taxid: 2) and seqkit grep to filtering fasta file by taxid.
Thanks, I will try this
Were you able to find a running solution, @boaty? I am interested in the exact same question