Question

Not getting the same results with biopython and NCBI

0

Entering edit mode

3.1 years ago

Harper • 0

I am new in python and I need to find the ORFs of some fasta sequences and I am using SeqIO in biopython. I followed the steps in the tutorial (considering table table = 11 and min_pro_len = 100) to get the ORFs in the fasta file given and then compared the results with the results for the same sequence using the Open Reading Frame Finder of NCBI.

However, the results are different. Does anyone knows why? Thanks!

biopython • 597 views

ADD COMMENT • link 3.1 years ago by Harper • 0

0

Entering edit mode

Please post the code you are using by editing your original post. Text descriptions are insufficient to follow/diagnose the problem.

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

Differences in algorithm options/parameters. You would need to ensure you are running the NCBI tool with exactly equivalent parameters to expect output to be completely identical. If the NCBI tool is doing any sort of sophisticated screening (I don't know if it is or not), you may find it suggests different candidates.

If you are trying to do 'proper' gene finding, you would be better off using GeneMark or Glimmer (for bacteria) though.

ADD REPLY • link 3.1 years ago by Joe 21k

0

Entering edit mode

Here goes the code, thanks again:

from Bio import SeqIO record = SeqIO.read("TestBacteria.fasta", "fasta") table = 11 min_pro_len = 100

for strand, nuc in [(+1, record.seq), (-1, record.seq.reverse_complement())]:     
    for frame in range(3):
        length = 3 * ((len(record)-frame) // 3) 
        for pro in nuc[frame:frame+length].translate(table).split("*"):
            if len(pro) >= min_pro_len:
                print("%s...%s - length %i, strand %i, frame %i" \
                      % (pro[:30], pro[-3:], len(pro), strand, frame))

ADD REPLY • link 3.1 years ago by Harper • 0