How to go from locus tag to FASTA sequence using Biopython / specify db for blast query
0
1
Entering edit mode
7.5 years ago

I am trying to take a list of S. cerevisiae gene locus tags [YDR251W, YDR342C, YPR022C, ...] and run a blastx search for homologs in some related species (ex. S. paradoxus, K. lactis). To automate this, I created a script using Biopython that searches for the NCBI gene ID [851838, 851943, 856133, ....]. While testing my code, I realized that running the blast searches manually using the gene IDs wasn't even returning the correct results. Although the gene IDs I have match the locus tags, when I put that ID into blast it usually seems to pull from the EST database instead of the Gene database. There are a couple ways I could go about fixing this, but I'm not sorry which approach is most straightforward (I am very new to the world of bioinformatics). 1. Is there a way in a blast search to specify which database you would like to pull from? I would later figure out how to do that within a Biopython script. 2. Is there a way to get Biopython to take a locus tag and retrieve the corresponding gene FASTA nucleotide sequence? Searching the Gene database for my locus tags and clicking "Go to nucleotide: FASTA" gives me the sequence I want to use in my blastx searches, but I don't know how to code that using Biopython.

blast • 2.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6