Question: Jupyter python cell will not finish to completion how to fix hash(id(seq) issue
0
gravatar for bcheda
20 months ago by
bcheda0
bcheda0 wrote:

I'm working on a project for my bioinformatics course for which there is no reference guide due to its De novo approach. Whenever I run my code of the 3732 test sequences it is supposed to go through, the program runs through about 20-25 sequences before quitting and outputting this error:

/anaconda3/lib/python3.6/site-packages/Bio/Seq.py:152: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour. "the new string hashing behaviour.", BiopythonWarning)

I have tried everything I could think of to fix this. Essentially, once this code figures out all the palindromes in the test sequences while making sure to remove duplicates, it then sorts the list and goes out and searches for those palindromes in specified genes. It is then supposed to output a hit table for each palindrome and sequence if that palindrome exists and where it exists. I am very new to python so any help would be appreciated. my code is as follows:

from Bio import Entrez, SeqIO
from Bio.Seq import Seq
import time
from nbconvert.preprocessors import ExecutePreprocessor
ep = ExecutePreprocessor(timeout=6000, kernel_name='python3')
def reverse_complement(string):
    seq_dict = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
    return ''.join([seq_dict[base] for base in (string)])
def reverse_palindromes(string, record):
    PrimaryStringresults = []
    lengthofinput = len(string)
    for character_position_in_input in range(lengthofinput):
        for Possible_Palidrome_Length in range(4, 12, 2):

            if character_position_in_input + Possible_Palidrome_Length > lengthofinput:
                continue
            input_string = string[character_position_in_input:character_position_in_input+Possible_Palidrome_Length]

            complemented_string = reverse_complement(string)

            complement = complemented_string[character_position_in_input:character_position_in_input+Possible_Palidrome_Length]
            # Original direction: 5'- G  A  A  T  T  C -3'
            # Original direction: 3'- C  T  T  A  A  G -5'

                # read as: 5'- G  A  A  T  T  C -3'
                # read as: 3'- G  A  A  T  T  C -5'
                # if all the positions match then it is a palindrome
            if input_string == complement[::-1]:
                    PrimaryStringresults.append((string[character_position_in_input:character_position_in_input+Possible_Palidrome_Length]))

    return PrimaryStringresults
def main():
    Entrez.email = "bcheda@arcadia.edu"


    SearchTerm = "H5N1 AND PB1"
    handle = Entrez.esearch(db="nucleotide", term = SearchTerm)


    record = Entrez.read(handle)
    handle.close()
    print("searching <nucleotide> for term: ", SearchTerm )

    print("Number of hits", record["Count"])
    print(" ")
    res_ids=record["IdList"]

    for r_id in res_ids:
        newhandle = Entrez.esummary(db="nucleotide", id=r_id)
        fullrecord = Entrez.read(newhandle)
        print("************************************************************")
        print("The ID for the sequence is: ",fullrecord[0]["Id"])
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="gb", retmode="text")
        record = SeqIO.read(handle, "genbank")
        print("The id value for the sequence is: ", record.id)
        print("The length of the sequence is: ", len(record))

        large_dataset = (str(record.seq))
        results = reverse_palindromes(large_dataset, record.id)
        Results = []
        Results = results
        print("************************************************************\n\n")
        handle.close()
        newhandle.close()
    return Results
def RemoveDuplicates(List):

    new_list = list(set(List))
    return new_list

def PB1(Palindromes):

    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND PB1")
    record = Entrez.read(handle)
    handle.close()

    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    PB1results = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        PB1_data = (record.seq)
        PB1results = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return PB1results
def PB2(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND PB2")

    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    PB2results = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        PB2_data = (record.seq)
        PB2results = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return PB2results
def PA(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND PA")
    #print("searching ", db, " for the term ", term)
    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    PAresults = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        PA_data = (record.seq)
        PAresults = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return PAresults
def HA(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND HA")
    #print("searching ", db, " for the term ", term)
    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    HAresults = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        HA_data = (record.seq)
        HAresults = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return HAresults
def NP(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND NP")

    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    NPresults = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        NP_data = (record.seq)
        NPresults = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return NPresults
def NA(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND NA")

    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    NAresults = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        NA_data = (record.seq)
        NAresults = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return NAresults
def M1(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND M1")
    #print("searching ", db, " for the term ", term)
    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    M1results = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        M1_data = (record.seq)
        M1results = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return M1results
def M2(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND M2")
    #print("searching ", db, " for the term ", term)
    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    M2results = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        M2_data = (record.seq)
        M2results = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return M2results
def NS1(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND NS1")
    #print("searching ", db, " for the term ", term)
    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    NS1results = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        NS1_data = (record.seq)
        NS1results = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return NS1results
def NEP(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND NEP")
    #print("searching ", db, " for the term ", term)
    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    NEPresults = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        NEP_data = (record.seq)
        NEPresults = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return NEPresults
def PB1F2(Palindromes):
    handle = Entrez.esearch(db="nucleotide", term = "influenza A virus AND PB1-F2")
    #print("searching ", db, " for the term ", term)
    record = Entrez.read(handle)
    handle.close()
    print("Number of hits", record["Count"])
    print(" ")
    res_ids = record["IdList"]
    PB1F2results = []
    for r_id in res_ids:
        handle=Entrez.efetch(db="nucleotide", id=r_id, rettype="fasta", retmode="text")
        record = SeqIO.read(handle, "fasta")
        PB1F2_data = (record.seq)
        PB1F2results = ProteinPalindromeChecker(record.seq, Palindromes, record.id)
    return PB1F2results
def ProteinPalindromeChecker(sequence, Palindromes, record):
    Stringresults = []
#calculates the total character length of the DNA inputted
    lengthofinput = len(sequence)
    for character_position_in_input in range(lengthofinput):
        for Possible_Palidrome_Length in range(4, 12, 2):

            if character_position_in_input + Possible_Palidrome_Length > lengthofinput:
                continue
            #breaks down the input DNA to characters and seperates it based on range restriction
            input_string = sequence[character_position_in_input:character_position_in_input+Possible_Palidrome_Length]
            #creates the complement stand
            complemented_string = reverse_complement(sequence)
            # breaks down the complement DNA to characters and seperates it based on range restriction
            complement = complemented_string[character_position_in_input:character_position_in_input+Possible_Palidrome_Length]
            # Original direction: 5'- G  A  A  T  T  C -3'
            # Original direction: 3'- C  T  T  A  A  G -5'
                #complement is then reversed and compared
                #If the two strings match in position per index position the palindrome exist
                # read as: 5'- G  A  A  T  T  C -3'
                # read as: 3'- G  A  A  T  T  C -5'
                # if all the positions match then it is a palindrome
            if input_string == Palindromes:
                    Stringresults.append(("Sequence ID: ",record,"start: ",'{:6}'.format(character_position_in_input + 1),"   Stop: ",'{:6}'.format(character_position_in_input + Possible_Palidrome_Length + 1), "   length: ", '{:3}'.format(Possible_Palidrome_Length), "\nForward: ", sequence[character_position_in_input:character_position_in_input+Possible_Palidrome_Length], "\nReverse: ", complemented_string[character_position_in_input:character_position_in_input+Possible_Palidrome_Length]))    
    return Stringresults
def hittable(Palindromes):
    PB1results = PB1(Palindromes)
    PB2results = PB2(Palindromes)
    PAresults = PA(Palindromes)
    HAresults = HA(Palindromes)
    NPresults = NP(Palindromes)
    NAresults = NA(Palindromes)
    M1results = M1(Palindromes)
    M2results = M2(Palindromes)
    NS1results = NS1(Palindromes)
    NEPresults = NEP(Palindromes)
    PB1F2results =PB1F2(Palindromes)
    headings = ['Palindromes', 'PB1results', 'PB2results', 'PAresults', 'HAresults', 'NPresults', 'NAresults', 'M1results', 'M2results', 'NS1results', 'NEPresults', 'PB1F2results']
    data = [headings] + list(zip(Palindromes, PB1results, PB2results, PAresults, HAresults, NPresults, NAresults, M1results, M2results, NS1results, NEPresults, PB1F2results))
    for i, d in enumerate(data):
        line = '|'.join(str(x).ljust(12) for x in d)
        print(line)
    if i == 0:
        print('-' * len(line))
if __name__ == '__main__':
    Palindromes = main()
    Palindromes = RemoveDuplicates(Palindromes)
    Palindromes = hittable(Palindromes)
ADD COMMENTlink modified 19 months ago by Biostar ♦♦ 20 • written 20 months ago by bcheda0

/anaconda3/lib/python3.6/site-packages/Bio/Seq.py:152: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour. "the new string hashing behaviour.", BiopythonWarning)

This is not an error, but a warning. This shouldn't crash your code (unless there are parts you didn't show us).

In addition, please use sensible tags for your question. I have no clue why you selected blast, but I'll change that to biopython. Choosing the right tags will make sure that the people who can help you with this question can also find it.

Tagging Peter

ADD REPLYlink modified 20 months ago • written 20 months ago by WouterDeCoster40k

I've played around with the code again, but it still seems to go all the way through

ADD REPLYlink written 20 months ago by bcheda0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1465 users visited in the last hour