I need to do custom filtration in fastq file as follows:

  1. Remove reads having at most 2 bases under quality score 20.

  2. Remove reads with unique sequence having read count less than 10.

I tried finding tools to do that but there is no such tool for above type of filterations. I also write some python script which use HTSeq package to read and process fastq file. But the script is extremely slow and take a day to process one file while I have 30 samples). Is there any fast way for this type of custom filtrations in fastq file.


ngs fastq • 114 views
Remove reads with unique sequence having read count less than 10.

Could you please explain why do you want to do this?

I am not sure about its exact cause and this is also new to me. I am trying to reproduce one nature paper which has pipeline for small RNA seq data to identify mature + Isomir miRNAs. In order to reproduce the exact results, I am doing what exactly has been written in paper.

