Question

trimming nano pore reads based on quality score.

0

Entering edit mode

6.5 years ago

KVC_bioinfo ▴ 590

Hello all,

I am trying to trim the nanopore sequences based on the quality of the reads. The Oxford nanopore technology mentions read below a quality score of 9 are considered to be the bad ones.

I have found This tool to do it. Has someone used it before? or is there any other way to do it?

nanopore trim • 7.6k views

ADD COMMENT • link updated 6.5 years ago by WouterDeCoster 47k • written 6.5 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

I have started using the tool I mentioned above:

python /path/to/NanoFilt.py /path/to/fastq/trim.fastq -q 9 > trim_qu.fastq

I constantly get:

usage: NanoFilt.py [-h] [-v] [-l LENGTH] [-q QUALITY] [--minGC MINGC]
                   [--maxGC MAXGC] [--headcrop HEADCROP] [--tailcrop TAILCROP]
                   [-s SUMMARY] [--readtype {1D,2D,1D2}]

ADD REPLY • link updated 6.5 years ago by GenoMax 141k • written 6.5 years ago by KVC_bioinfo ▴ 590

1

Entering edit mode

Try moving -q 9 before file name. Some programs are sensitive to order of input options.

ADD REPLY • link 6.5 years ago by GenoMax 141k

0

Entering edit mode

yes tried. gives same error

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

2

Entering edit mode

Can you try?

cat /path/to/fastq/trim.fastq | python /path/to/NanoFilt.py  -q 9 > trim_qu.fastq

ADD REPLY • link 6.5 years ago by GenoMax 141k

score 4 · Accepted Answer · 2017-11-10

4

Entering edit mode

6.5 years ago

WouterDeCoster 47k

Hm I'm not sure if you mean trimming or filtering. What you are doing is filtering: removing reads below the quality cutoff. Trimming nucleotides from the read ends is also possible using NanoFilt.

Regarding the command you are trying to use I would suggest to have a look at the documentation on GitHub, Pypi, the blog post or use NanoFilt --help

I added examples on how to use NanoFilt to all of these. If you have suggestions on how to improve the documentation I would like to hear those, but right now it looks like you haven't read it.

Anyway, genomax is right, NanoFilt reads from stdin and writes to stdout. This makes it compatible with any compression type and allows you to sandwich it between for example decompression and an aligner. If installed correctly there is no need to add ".py" or the full path to the script.

I would suggest, if possible, to use a complimentary albacore summary file for filtering. That will speed up things significantly. Right now, calculating the average read quality is pretty slow, but I will look into this...

I don't think I agree that all reads with score below 9 are garbage. This definitely depends on your application.

ADD COMMENT • link 6.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Nanofilt works great for me. Thanks for the suggestion.

About the quality score of 9: I read that in the review paper of Nanopore sequencing.

ADD REPLY • link 6.4 years ago by KVC_bioinfo ▴ 590

0

Entering edit mode

@WouterDeCoster

when is the situation/application that reads with score < 9 should be kept? how to determine the threshold of quality filtering?

Thanks.

ADD REPLY • link 5.8 years ago by elzedleeu ▴ 20

0

Entering edit mode

@Wouter is on vacation. If you have a reliable reference you can align to then you can decide if you want to keep reads with average score below 9. If you don't have a reference you are better off leaving those reads where they belong .. in "failed" category.

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

There are definitely different scenario's, for example:

de novo assembly: you don't want terrible reads
SNP identification: you don't want too many low quality bases
Structural variant detection: a low quality read spanning a breakpoint is fine
quantitative transcriptomics: if it maps it is fine
metagenomics identification: if it is sufficiently identical to a hit in the database it is probably fine
...

Determining the threshold is harder - I don't think I can provide an answer.

ADD REPLY • link 5.7 years ago by WouterDeCoster 47k