Entrez search to CSV file
0
0
Entering edit mode
6.0 years ago
pmramos95 • 0

Hi everyone, this is my first post and I'm needing help for a project in a bioinformatics class I am in. What I am trying to do is pull influenza A H3N2 DNA sequences from an Entrez.esearch and determining which sequences contain the restriction enzymes EcORI, BamHI, and HindIII.

Here is my code so far:

    from Bio import Entrez, SeqIO

Entrez.email = "email.here"

# let's do a search for influenza H1N1 viruses from Texas
handle = Entrez.esearch(db="nucleotide",  # database to search
                        term="influenza a virus texas h3n2 2017 hemagglutinin complete cds",  # search term
                        retmax=200  # number of results that are returned
                        )
record = Entrez.read(handle)
handle.close()

gi_list = record["IdList"] # list of genbank identifiers found

handle = Entrez.efetch(db="nucleotide", id=gi_list, rettype="gb", retmode="text")
data = handle.read()
handle.close() # close the handle

# Write data to a file
with open("influenza_HA.gb", "w") as outfile:
    outfile.write(data)





    import re



# Open the genbank file with flu sequences
in_handle = open("influenza_HA.gb", "r")
records = SeqIO.parse(in_handle, "genbank")


#Find restriction enzyme sequences for EcORI, BamHI, and HindIII

def count_re(seq, csv_file):

    re_dict = {}

    #Loop over
    for record in records:

        match_eco = re.search(r"GAATTC", str(record.seq))
        match_bam = re.search(r"GGATCC", str(record.seq))
        match_hin = re.search(r"AAGCTT", str(record.seq))

        if match_eco:
            eco_count = 1
        else:
            eco_count = 0

        if match_bam:
            bam_count = 1
        else:
            bam_count = 0

        if match_hin:
            hin_count = 1
        else:
            hin_count = 0

    #open file for writing

    with open(csv_file, "w") as file:

        # sort re
        re_list=sorted(re_dict.keys())

        file.write("ID, EcORI, BamHI, HindIII")

        # loop over sorted keys
        for re in re_list:
            # new line with values separated by ","
            line = str(re) + ',' + str(re_dict[re]) + '\n'

            # write to the file
            file.write(line)

count_re(in_handle, "project3_re_1.csv")
nucleotide Entrez CSV • 1.8k views
ADD COMMENT
0
Entering edit mode

Is there a specific question here? Is this program not working?

Since this is a class assignment (thank you for stating that up front) you should be specific about what you need help with.

ADD REPLY
0
Entering edit mode

I'm just needing help getting the information into a csv file. I would like the header of the excel file to read as something along the lines of:

ID   | EcoRI | BamHI | HindIII 
111 | yes     | no        | no
112 | no      | yes      | yes
etc.

I hope this helps. Sorry about the formatting its my first time posting to forums like this and I'm very inexperienced in programming.

ADD REPLY

Login before adding your answer.

Traffic: 2571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6