Hello, everyone. I might be asking a question with a very simple answer and probably the one that was already answered here before, but I would really appreciate any help.
I am trying to design a new set of degenerate primers to amplify a gene (pstS) from bacterial metagenomes. While I have never worked before with metagenomic samples, I clearly understand the steps that need to be taken in order to do so:
- Find and download nucleotide sequences of my gene (Genbank) from different bacterial species;
- Perform multiple alignment of these nucleotide sequences and/or their protein translations (CLUSTAL, MUSCLE, T-COFFEE etc.);
- Identify conservative regions;
- Select primer sequences, either manually or using a specialized program (CODEHOP, Primaclade, HYDEN).
So my question is simple: how to batch download gene sequences from Genbank? If I use Entrez Nucleotide, it gives me all the sequences containing pstS, including whole genomic sequences, plasmids and so on, and I have no idea how to filter them out. I am not afraid to use BioPerl/BioPython or any other way of collecting data from Genbank programmatically, but I am worried that there exist a simple method that I am missing.
Thank you in advance, I am really struggling with this simple step and that makes me uncomfortable.