Hello, I am interested in building a workflow to collect influenza A and B viral sequences, align them, and prepare a curated database that I can later use to design primers ensuring broad inclusivity. My questions are: Are there recommended pipelines or published resources that describe how to do this specifically for influenza? What should I take into account when curating such a database (e.g., redundancy, incomplete sequences, metadata filtering)? Which tools or software are most commonly used by the community for sequence download, alignment, and primer design in this context? If there are existing references, databases, or community best practices already documented, I would really appreciate pointers. Thank you!
Have you looked at https://nextstrain.org/ and specifically https://github.com/nextstrain/seasonal-flu
Software is available to do the analysis: https://docs.nextstrain.org/en/latest/install.html
NCBI already has precomputed some of this and you may be able to use the info directly: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=taxid:197911&VirusLineage_ss=taxid:197912&VirusLineage_ss=taxid:197913&VirusLineage_ss=taxid:1511083