Metagenomic Degenerate Primers Design - How to download multiple gene sequences from Genbank?
1
1
Entering edit mode
8.4 years ago
Tim ▴ 130

Hello, everyone. I might be asking a question with a very simple answer and probably the one that was already answered here before, but I would really appreciate any help.

I am trying to design a new set of degenerate primers to amplify a gene (pstS) from bacterial metagenomes. While I have never worked before with metagenomic samples, I clearly understand the steps that need to be taken in order to do so:

1. Find and download nucleotide sequences of my gene (Genbank) from different bacterial species;
2. Perform multiple alignment of these nucleotide sequences and/or their protein translations (CLUSTAL, MUSCLE, T-COFFEE etc.);
3. Identify conservative regions;
4. Select primer sequences, either manually or using a specialized program (CODEHOP, Primaclade, HYDEN).

So my question is simple: how to batch download gene sequences from Genbank? If I use Entrez Nucleotide, it gives me all the sequences containing pstS, including whole genomic sequences, plasmids and so on, and I have no idea how to filter them out. I am not afraid to use BioPerl/BioPython or any other way of collecting data from Genbank programmatically, but I am worried that there exist a simple method that I am missing.

Thank you in advance, I am really struggling with this simple step and that makes me uncomfortable.

0
Entering edit mode
8.4 years ago

If you have a locally installed nt or nr database, then you can use blastdbcmd from blast suite to extract the sequences. For example, the following link tells you how to extract 16S rRNA sequences:

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/oneliners.html?#BLASTDBCMD

For downloading sequences from Uniprot (Swiss-Prot,trEMBL), you can use extract_fasta_swissprot.py script from here.

Also read section 9 from Biopython Cookbook.

Best Wishes,
Umer

0
Entering edit mode

Thank you, Umer, I will try your suggestions