Question: Blastn on illumina reads for contamination detection before or after quality filtering?
0
gravatar for robert.murphy
10 weeks ago by
robert.murphy30 wrote:

Should you blastn illumina reads for contamination before or after you have quality controlled them based on phred score?

What effects would a before and after be predicted to have?

assembly • 203 views
ADD COMMENTlink modified 6 days ago by Biostar ♦♦ 20 • written 10 weeks ago by robert.murphy30

you should not blast(n) any reads whenever ...

no , serious now: blast should not be your go-to tool when dealing with NGS data, there exists for better and more efficient software to accomplish blast-like tasks for NGS data.

Seen that you ask for contamination, have a look in to things like KRAKEN (and/or google for "NGS data and contamination" or such)

And before or after quality filtering will not make much difference, if real contamination is present it would not get removed by Q-filtering.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by lieven.sterck9.5k

Thank you for this.

Why should one not blastn reads?

ADD REPLYlink written 10 weeks ago by robert.murphy30

efficiency/speed for one. Also the sensitivity of blast on those rather short sequences is less than real read mappers. (perhaps less of an issue here but blast has also no notion of "paired-end", which is an important concept in NGS data).

ADD REPLYlink written 10 weeks ago by lieven.sterck9.5k

So should you merge paired end reads when using blastn in this way or just only use the forward reads? I will give kraken a look :)

ADD REPLYlink written 10 weeks ago by robert.murphy30

Remember to use -task blastn-short when you run the blast searches. Blast would be sensitive to contamination from adapter sequences so you should merge and then scan/trim the reads prior to blast searches, if you want to do this.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by GenoMax95k

Blastn is very slow, you may use bwa o bowtie2 for mapping reads on known possible contamination genomes.

Removing condamination AFTER QC is better, because the latter is faster. Runing the slower process in smaller data costs less time.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by shenwei3565.7k

You should have no contamination in well prepared libraries. Do you know for certain there is contamination?

ADD REPLYlink written 10 weeks ago by GenoMax95k

In the short reads I found no contamination but when blasting the assembled contigs I found 2 mapping to the incorrect species. The culture was pure but the DNA was extracted by the sequencing company. Is is likely the long reads (it is a hybrid assembly) as contaminated but the short reads are not)

ADD REPLYlink written 9 weeks ago by robert.murphy30

blasting the assembled contigs I found 2 mapping to the incorrect species

That does not seems like strong evidence of contamination. Since blast does local alignments it is possible that you may have got those alignments by chance. You would want to investigate carefully before drawing a conclusion.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by GenoMax95k

So due to how the blast algorithm works it is not the best for contamination detection unless paired with other information?

ADD REPLYlink written 9 weeks ago by robert.murphy30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1898 users visited in the last hour
_