I have a large metagenomic RNA-seq dataset that I am trying to assemble to find viral sequences but it is too large for my hardware (52gb RAM). I can see that there is a lot of bacterial contamination from many different species when I BLAST reads. I want to filter out all bacterial reads so that I can assemble. Ideas?
Download all bacterial genomes from Refseq and try to bowtie to that (will take a long time). As well, since when has the compressed Refseq bacterial fna files reached 72gb (when combined)?!? The last all.bacteria.gz file in Refseq archive from 2015 is 2.7gb...
Somehow condense all bacterial genomes into non-redundant, then align?