Hello,
I am trying to collect the protein coding region for a specific gene for a list of birds from the NCBI nucleotide database using Entrez. I am able to get 1 CDS for most birds, but one of them returns a ton of unrelated genes, whereas others that have the gene listed in the NCBI nucleotide database are not showing up. I was wondering if anyone could look at my code and see where the problem lies?
from Bio import Entrez
from Bio import SeqIO
import os
Entrez.email = "email@blahblah.com"
species_list = ["Gallus gallus", "Numida meleagris","Phasianus colchicus",
"Meleagris gallopavo", "Coturnix japonica", "Anas platyrhynchos", "Numida meleagris",
"Columba livia", "Anser cygnoides domesticus", "Rhea americana", "Apteryx rowi",
"Nothoprocta perdicaria", "Struthio camelus australis", "Dromaius novaehollandiae",
"Calidris pugnax", "Aquila chrysaetos chrysaetos", "Athene cunicularia", "Strix occidentalis caurina",
"Melopsittacus undulatus", "Strigops habroptila", "Lepidothrix coronata", "Corvus moneduloides",
"Hirundo rustica", "Taeniopygia guttata", "Lonchura striata domestica", "Motacilla alba alba",
"Serinus canaria", "Camarhynchus parvulus", "Parus major", "Cyanistes caeruleus", "Ficedula albicollis",
"Catharus ustulatus", "Sterna hirundo"]
length = len(species_list)
f = open("DMC1.txt", "x")
for i in range(length):
species = species_list[i]
handle1 = Entrez.esearch(db="gene", term= species + "[Orgn] AND dmc1[Gene]", idtype="acc", retmax = 3)
record = Entrez.read(handle1)
idlist = record["IdList"]
handle2 = Entrez.efetch(db="nuccore", id=idlist, rettype="fasta_cds_na", retmax = 3, retmode = "text")
print(species, file =open("DMC1.txt", "a"))
print(handle2.read(),file =open("DMC1.txt", "a"))
Is gene called
dmc1
in all species? If not that can lead it odd results. Genomes for some may not be as complete (as chicken) so that could be the other reason.Hi, thanks for your response! It is called DMC1 in all of the birds, which is why I am confused about all the junk it is outputting. I am hoping to streamline this so I can look at more genes in the same species list.
Using EntrezDirect I see the following: