Filtering of RNA fastq files prior to aligment
1
0
Entering edit mode
3.6 years ago
ognjen011 ▴ 200

I have a pair of RNA fastq files that are heavily contaminated with rRNA sequences, to such extent that almost one half of reads originate from a 10K long sequence of reference. I want to remove these reads, and I know I can do it at BAM level in one of many ways (ike Samtools View), but I am curious if there is an efficient way of removing those reads at the fastq stage. The idea is to speed up the alignment in some way.

Is that a good way to remove unwanted reads prior to alignment, in order to speed up the processing? Maybe a trimming tool, or such?

RNA-Seq alignment • 1.5k views
ADD COMMENT
3
Entering edit mode
3.6 years ago

The most efficient way would be alignment-free kmer-matching. To do that, you can remove all reads containing ribosomal kmers with BBDuk (where ref is the ribosomal sequence of this organism):

bbduk.sh in1=r1.fq in2=r2.fq out1=clean1.fq out2=clean2.fq ref=ribo.fa k=31 mm=f

You can alternatively gather a set of kmers that are present in ribosomal sequence but not in the rest of the genome and use that, but it's more complicated and probably not necessary.

ADD COMMENT
0
Entering edit mode

Thank you! Although this is exactly what I need, is there a free to use alternative? BBDuk seems to be a strictly commercial software.

ADD REPLY
0
Entering edit mode

Sorry, I seem to be looking at the wrong package. Thanks for the answer!

ADD REPLY
0
Entering edit mode

To make this clear to future readers, BBDuk is open-source, available here, and free for all uses.

ADD REPLY

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6