Question: Suggestions for which genomes to use when removing microbial contamination with BBsplit
gravatar for Dave Carlson
7 months ago by
Dave Carlson320
Stony Brook University, NY
Dave Carlson320 wrote:

Hi Biostars, I'm currently working on a few pipelines to process and perform various analyses on human RNA-seq data. One of the steps in all of the pipelines is removal of contaminating microbial reads from the input fastq files. Based on recommendations here and elsewhere, I'm using the BBSplit program from BBMap.

My question is in regard to which potential sources of contamination I should be mapping to. Currently, I've downloaded essentially all the microbial RefSeq assemblies (bacterial, archaeal, protozoan, viral, fungal) and concatenated them together into a single "contaminants" fasta file.

However, using all of these genomes makes the analysis take a couple of hours for each sample, and more importantly uses a prohibitively large amount of memory (> 500 GB). I'd like to pair down the number of microbial assemblies I use in this analysis, but I'm not sure where to start.

Are there any "standard" sets of genomes that people typically use when decontaminating fastq data? Alternatively, if anybody has performed this sort of analysis before and has suggestions for which species I should (or shouldn't!) include, I'd love to get your advice.



bbmap genome • 225 views
ADD COMMENTlink modified 6 months ago by Biostar ♦♦ 20 • written 7 months ago by Dave Carlson320

Is contamination a real concern? If your analysis pipeline includes mapping to the human genome, most or all contaminants would be filtered out at this stage.

ADD REPLYlink written 7 months ago by h.mon29k

That's a good question. Contamination is not of special concern (this isn't ancient DNA!). I mostly just wanted to be thorough. But yes, all pipelines will involve either mapping to the human reference genome or transcriptome.

ADD REPLYlink written 7 months ago by Dave Carlson320
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1210 users visited in the last hour