Hello,
I have 50,000 genomes annotated with prokka and one gene of interest. Is there a fast way to create an alignment for my gene of interest for all 50,000 genomes.
My first instinct is to use PIRATE. Are there quicker alternatives?
Is there a way to grab, for all 50,000 genomes annotated with prokka, the gene of interest from the .gff file then align all of them. Any tools, scripts, or approaches for this?
Thanks.
Since PROKKA was used, you should have all genes in the Fasta file with the suffix .ffn. If the gene name is annotated correctly, you can use e.g. FAST's fasgrep to extract it.
As GenoMax pointed out, that an MSA with 50k sequences is tedious, you can use e.g. MMSEQ2 for clustering.