Filter contigs by size: Different output between quast report and python output
0
0
Entering edit mode
4.5 years ago
Dave Th ▴ 60

Hi all,

I'm trying to filter my contigs dataset into different files by their length such as 500bp, 1kb, 2kb... I'm using below code to produce my output.

def contigs_filter_by_length(fasta_input, size, fasta_output):
long_contigs =  [] #Create an empty list
for record in SeqIO.parse(fasta_input,"fasta"):
    if len(record.seq) >= size:
        long_contigs.append(record)
print("Found %i contigs" %len(long_contigs))
SeqIO.write(long_contigs,fasta_output,"fasta")

The problem is when I crosschecked with QUAST report of my input file and the output from the code, there was a huge difference between them. QUAST indicated that there are 119787 contigs >= 500bp while the fasta output from the code showed 122046 contigs >=500bp.

Is there anything wrong in my code which lead to this difference?

sequence assembly • 1.5k views
ADD COMMENT
0
Entering edit mode

I haven't seen anything wrong in your code, have you compared the results? You can find some contigs reported by your python code while not by QUAST to see what caused the difference

ADD REPLY
0
Entering edit mode

I think this might be the key.

QUAST may be doing some additional filtering of 'junk' sequences which are obvious misassembly artefacts or deduplication.

Not 100% for certain, but that would be my immediate guess.

ADD REPLY
0
Entering edit mode

for what "SeqIO.parse" stands for? (trying to understand the command) I'm trying to filter contigs so this code can help me.

ADD REPLY
1
Entering edit mode

That is standard SeqIO interface included in Biopython (LINK).

ADD REPLY
0
Entering edit mode

Hello Dave, iḿ trying to use your code for filtering some contigs, but I got a identation error message:

File "contig_length_filter.py", line 2 long_contigs = [] ^ IndentationError: expected an indented block

so I suppose that I must add something on the double brackets?

Regards :)

ADD REPLY
0
Entering edit mode

IndentationError: expected an indented block ?

ADD REPLY
0
Entering edit mode

The code in the first post has incorrect indentation levels for python. You should not copy it verbatim.

ADD REPLY

Login before adding your answer.

Traffic: 2714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6