I have a hard time solving a relatively simple problem (I am a novice)
I want to screen a collection of >8000 Prokaryot genomes for at specific gene, in order to make an prescence/absence table for a GWAS study. However, I am not sure how to go about this. I have the sequence of the gene in FASTA format, as well as all the genomes. Could I perhaps use Blast+ and create a local database of the genomes? I have the genomes as fna files directly from GenBank but I also have annotated versions from running Prokka.
Thank you in advance!
You can use searching gene against genome strategy using blast+ (blastn - searches nucleotide gene sequence against genome database) utility. But searching a gene sequence against predicted genes is a good strategy and would be more precise than searching against genome.
You can use
.fasta
file generated by prokka, which will contain predicted gene sequence for an individual organism.NOTE: While searching your gene of interest against predicted genes, you may face a problem of unique identifier. Before making a gene database you have to change gene headers so that at the end you can track the source organism for the hit.