Question

Entrez search to CSV file

0

Entering edit mode

6.0 years ago

pmramos95 • 0

Hi everyone, this is my first post and I'm needing help for a project in a bioinformatics class I am in. What I am trying to do is pull influenza A H3N2 DNA sequences from an Entrez.esearch and determining which sequences contain the restriction enzymes EcORI, BamHI, and HindIII.

Here is my code so far:

    from Bio import Entrez, SeqIO

Entrez.email = "email.here"

# let's do a search for influenza H1N1 viruses from Texas
handle = Entrez.esearch(db="nucleotide",  # database to search
                        term="influenza a virus texas h3n2 2017 hemagglutinin complete cds",  # search term
                        retmax=200  # number of results that are returned
                        )
record = Entrez.read(handle)
handle.close()

gi_list = record["IdList"] # list of genbank identifiers found

handle = Entrez.efetch(db="nucleotide", id=gi_list, rettype="gb", retmode="text")
data = handle.read()
handle.close() # close the handle

# Write data to a file
with open("influenza_HA.gb", "w") as outfile:
    outfile.write(data)





    import re



# Open the genbank file with flu sequences
in_handle = open("influenza_HA.gb", "r")
records = SeqIO.parse(in_handle, "genbank")


#Find restriction enzyme sequences for EcORI, BamHI, and HindIII

def count_re(seq, csv_file):

    re_dict = {}

    #Loop over
    for record in records:

        match_eco = re.search(r"GAATTC", str(record.seq))
        match_bam = re.search(r"GGATCC", str(record.seq))
        match_hin = re.search(r"AAGCTT", str(record.seq))

        if match_eco:
            eco_count = 1
        else:
            eco_count = 0

        if match_bam:
            bam_count = 1
        else:
            bam_count = 0

        if match_hin:
            hin_count = 1
        else:
            hin_count = 0

    #open file for writing

    with open(csv_file, "w") as file:

        # sort re
        re_list=sorted(re_dict.keys())

        file.write("ID, EcORI, BamHI, HindIII")

        # loop over sorted keys
        for re in re_list:
            # new line with values separated by ","
            line = str(re) + ',' + str(re_dict[re]) + '\n'

            # write to the file
            file.write(line)

count_re(in_handle, "project3_re_1.csv")

nucleotide Entrez CSV • 1.8k views

ADD COMMENT • link updated 6.0 years ago by GenoMax 141k • written 6.0 years ago by pmramos95 • 0

0

Entering edit mode

Is there a specific question here? Is this program not working?

Since this is a class assignment (thank you for stating that up front) you should be specific about what you need help with.

ADD REPLY • link 6.0 years ago by GenoMax 141k

0

Entering edit mode

I'm just needing help getting the information into a csv file. I would like the header of the excel file to read as something along the lines of:

ID   | EcoRI | BamHI | HindIII 
111 | yes     | no        | no
112 | no      | yes      | yes
etc.

I hope this helps. Sorry about the formatting its my first time posting to forums like this and I'm very inexperienced in programming.

ADD REPLY • link updated 6.0 years ago by GenoMax 141k • written 6.0 years ago by pmramos95 • 0