Suggestions for which genomes to use when removing microbial contamination with BBsplit
0
1
Entering edit mode
4.6 years ago
Dave Carlson ★ 1.7k

Hi Biostars, I'm currently working on a few pipelines to process and perform various analyses on human RNA-seq data. One of the steps in all of the pipelines is removal of contaminating microbial reads from the input fastq files. Based on recommendations here and elsewhere, I'm using the BBSplit program from BBMap.

My question is in regard to which potential sources of contamination I should be mapping to. Currently, I've downloaded essentially all the microbial RefSeq assemblies (bacterial, archaeal, protozoan, viral, fungal) and concatenated them together into a single "contaminants" fasta file.

However, using all of these genomes makes the analysis take a couple of hours for each sample, and more importantly uses a prohibitively large amount of memory (> 500 GB). I'd like to pair down the number of microbial assemblies I use in this analysis, but I'm not sure where to start.

Are there any "standard" sets of genomes that people typically use when decontaminating fastq data? Alternatively, if anybody has performed this sort of analysis before and has suggestions for which species I should (or shouldn't!) include, I'd love to get your advice.

Thanks!

Dave

genome BBmap • 1.1k views
ADD COMMENT
1
Entering edit mode

Is contamination a real concern? If your analysis pipeline includes mapping to the human genome, most or all contaminants would be filtered out at this stage.

ADD REPLY
0
Entering edit mode

That's a good question. Contamination is not of special concern (this isn't ancient DNA!). I mostly just wanted to be thorough. But yes, all pipelines will involve either mapping to the human reference genome or transcriptome.

ADD REPLY

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6