Question: Entrez search to CSV file
0
gravatar for pmramos95
11 months ago by
pmramos950
pmramos950 wrote:

Hi everyone, this is my first post and I'm needing help for a project in a bioinformatics class I am in. What I am trying to do is pull influenza A H3N2 DNA sequences from an Entrez.esearch and determining which sequences contain the restriction enzymes EcORI, BamHI, and HindIII.

Here is my code so far:

    from Bio import Entrez, SeqIO

Entrez.email = "email.here"

# let's do a search for influenza H1N1 viruses from Texas
handle = Entrez.esearch(db="nucleotide",  # database to search
                        term="influenza a virus texas h3n2 2017 hemagglutinin complete cds",  # search term
                        retmax=200  # number of results that are returned
                        )
record = Entrez.read(handle)
handle.close()

gi_list = record["IdList"] # list of genbank identifiers found

handle = Entrez.efetch(db="nucleotide", id=gi_list, rettype="gb", retmode="text")
data = handle.read()
handle.close() # close the handle

# Write data to a file
with open("influenza_HA.gb", "w") as outfile:
    outfile.write(data)





    import re



# Open the genbank file with flu sequences
in_handle = open("influenza_HA.gb", "r")
records = SeqIO.parse(in_handle, "genbank")


#Find restriction enzyme sequences for EcORI, BamHI, and HindIII

def count_re(seq, csv_file):

    re_dict = {}

    #Loop over
    for record in records:

        match_eco = re.search(r"GAATTC", str(record.seq))
        match_bam = re.search(r"GGATCC", str(record.seq))
        match_hin = re.search(r"AAGCTT", str(record.seq))

        if match_eco:
            eco_count = 1
        else:
            eco_count = 0

        if match_bam:
            bam_count = 1
        else:
            bam_count = 0

        if match_hin:
            hin_count = 1
        else:
            hin_count = 0

    #open file for writing

    with open(csv_file, "w") as file:

        # sort re
        re_list=sorted(re_dict.keys())

        file.write("ID, EcORI, BamHI, HindIII")

        # loop over sorted keys
        for re in re_list:
            # new line with values separated by ","
            line = str(re) + ',' + str(re_dict[re]) + '\n'

            # write to the file
            file.write(line)

count_re(in_handle, "project3_re_1.csv")
entrez csv nucleotide • 328 views
ADD COMMENTlink modified 11 months ago by genomax65k • written 11 months ago by pmramos950

Is there a specific question here? Is this program not working?

Since this is a class assignment (thank you for stating that up front) you should be specific about what you need help with.

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax65k

I'm just needing help getting the information into a csv file. I would like the header of the excel file to read as something along the lines of:

ID   | EcoRI | BamHI | HindIII 
111 | yes     | no        | no
112 | no      | yes      | yes
etc.

I hope this helps. Sorry about the formatting its my first time posting to forums like this and I'm very inexperienced in programming.

ADD REPLYlink modified 11 months ago by genomax65k • written 11 months ago by pmramos950
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 762 users visited in the last hour