Filter contigs by size: Different output between quast report and python output
0
0
Entering edit mode
17 months ago
Dave Th ▴ 20

Hi all,

I'm trying to filter my contigs dataset into different files by their length such as 500bp, 1kb, 2kb... I'm using below code to produce my output.

def contigs_filter_by_length(fasta_input, size, fasta_output):
long_contigs =  [] #Create an empty list
for record in SeqIO.parse(fasta_input,"fasta"):
if len(record.seq) >= size:
long_contigs.append(record)
print("Found %i contigs" %len(long_contigs))
SeqIO.write(long_contigs,fasta_output,"fasta")


The problem is when I crosschecked with QUAST report of my input file and the output from the code, there was a huge difference between them. QUAST indicated that there are 119787 contigs >= 500bp while the fasta output from the code showed 122046 contigs >=500bp.

Is there anything wrong in my code which lead to this difference?

sequence assembly • 538 views
0
Entering edit mode

I haven't seen anything wrong in your code, have you compared the results? You can find some contigs reported by your python code while not by QUAST to see what caused the difference

0
Entering edit mode

I think this might be the key.

QUAST may be doing some additional filtering of 'junk' sequences which are obvious misassembly artefacts or deduplication.

Not 100% for certain, but that would be my immediate guess.

0
Entering edit mode

for what "SeqIO.parse" stands for? (trying to understand the command) I'm trying to filter contigs so this code can help me.

1
Entering edit mode

That is standard SeqIO interface included in Biopython (LINK).

0
Entering edit mode

Hello Dave, iḿ trying to use your code for filtering some contigs, but I got a identation error message:

File "contig_length_filter.py", line 2 long_contigs = [] ^ IndentationError: expected an indented block

so I suppose that I must add something on the double brackets?

Regards :)

0
Entering edit mode

IndentationError: expected an indented block ?

0
Entering edit mode

The code in the first post has incorrect indentation levels for python. You should not copy it verbatim.