Hi there,
I have some reads that were generated from vanishingly low RNA input. I did first/second strand synthesis and an Illumina DNA Prep library preparation. I sent the samples for sequencing and (I assume) because there was such low input, there are a huge number of reads that are just 150 bp of A or 150 bp of T, etc. The end goal is to use kraken/metabuli for metagenomic classification of the reads, likely after filtering out human reads, so I'd rather save some compute and not waste time aligning total garbage when in some samples it accounts for 40% of the reads.
I'm wondering if there is an existing tool or best practice to filter out these reads that are very obviously trash. The read quality is very high, so I can't use quality filtering... I suppose I can write a quick script that just trashes anything with an unreasonably high fraction of nucleotides of one type, but hoping that this is something others have come across and dealt with in a more systematic way.
Thanks!
Sean
Thanks so much, this is exactly what I needed!