Question: Entrez search to CSV file
gravatar for pmramos95
21 months ago by
pmramos950 wrote:

Hi everyone, this is my first post and I'm needing help for a project in a bioinformatics class I am in. What I am trying to do is pull influenza A H3N2 DNA sequences from an Entrez.esearch and determining which sequences contain the restriction enzymes EcORI, BamHI, and HindIII.

Here is my code so far:

    from Bio import Entrez, SeqIO = ""

# let's do a search for influenza H1N1 viruses from Texas
handle = Entrez.esearch(db="nucleotide",  # database to search
                        term="influenza a virus texas h3n2 2017 hemagglutinin complete cds",  # search term
                        retmax=200  # number of results that are returned
record =

gi_list = record["IdList"] # list of genbank identifiers found

handle = Entrez.efetch(db="nucleotide", id=gi_list, rettype="gb", retmode="text")
data =
handle.close() # close the handle

# Write data to a file
with open("", "w") as outfile:

    import re

# Open the genbank file with flu sequences
in_handle = open("", "r")
records = SeqIO.parse(in_handle, "genbank")

#Find restriction enzyme sequences for EcORI, BamHI, and HindIII

def count_re(seq, csv_file):

    re_dict = {}

    #Loop over
    for record in records:

        match_eco ="GAATTC", str(record.seq))
        match_bam ="GGATCC", str(record.seq))
        match_hin ="AAGCTT", str(record.seq))

        if match_eco:
            eco_count = 1
            eco_count = 0

        if match_bam:
            bam_count = 1
            bam_count = 0

        if match_hin:
            hin_count = 1
            hin_count = 0

    #open file for writing

    with open(csv_file, "w") as file:

        # sort re

        file.write("ID, EcORI, BamHI, HindIII")

        # loop over sorted keys
        for re in re_list:
            # new line with values separated by ","
            line = str(re) + ',' + str(re_dict[re]) + '\n'

            # write to the file

count_re(in_handle, "project3_re_1.csv")
entrez csv nucleotide • 568 views
ADD COMMENTlink modified 21 months ago by genomax76k • written 21 months ago by pmramos950

Is there a specific question here? Is this program not working?

Since this is a class assignment (thank you for stating that up front) you should be specific about what you need help with.

ADD REPLYlink modified 21 months ago • written 21 months ago by genomax76k

I'm just needing help getting the information into a csv file. I would like the header of the excel file to read as something along the lines of:

ID   | EcoRI | BamHI | HindIII 
111 | yes     | no        | no
112 | no      | yes      | yes

I hope this helps. Sorry about the formatting its my first time posting to forums like this and I'm very inexperienced in programming.

ADD REPLYlink modified 21 months ago by genomax76k • written 21 months ago by pmramos950
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 876 users visited in the last hour