Not getting the same results with biopython and NCBI
0
0
Entering edit mode
3.1 years ago
Harper • 0

I am new in python and I need to find the ORFs of some fasta sequences and I am using SeqIO in biopython. I followed the steps in the tutorial (considering table table = 11 and min_pro_len = 100) to get the ORFs in the fasta file given and then compared the results with the results for the same sequence using the Open Reading Frame Finder of NCBI.

However, the results are different. Does anyone knows why? Thanks!

biopython • 597 views
ADD COMMENT
0
Entering edit mode

Please post the code you are using by editing your original post. Text descriptions are insufficient to follow/diagnose the problem.

ADD REPLY
0
Entering edit mode

Differences in algorithm options/parameters. You would need to ensure you are running the NCBI tool with exactly equivalent parameters to expect output to be completely identical. If the NCBI tool is doing any sort of sophisticated screening (I don't know if it is or not), you may find it suggests different candidates.

If you are trying to do 'proper' gene finding, you would be better off using GeneMark or Glimmer (for bacteria) though.

ADD REPLY
0
Entering edit mode

Here goes the code, thanks again:

from Bio import SeqIO record = SeqIO.read("TestBacteria.fasta", "fasta") table = 11 min_pro_len = 100

for strand, nuc in [(+1, record.seq), (-1, record.seq.reverse_complement())]:     
    for frame in range(3):
        length = 3 * ((len(record)-frame) // 3) 
        for pro in nuc[frame:frame+length].translate(table).split("*"):
            if len(pro) >= min_pro_len:
                print("%s...%s - length %i, strand %i, frame %i" \
                      % (pro[:30], pro[-3:], len(pro), strand, frame))
ADD REPLY

Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6