Question

non human contaminants in sequencing

0

Entering edit mode

7.9 years ago

rob.costa1234 ▴ 310

I want to detect any nonhuman contaminants in my sequencing data (RNA/ DNA) Is there a quick tool which can provide a kind of estimate without actually aligning to the genome. I think DeconSeq is not working.

Thanks

sequencing • 2.6k views

ADD COMMENT • link updated 7.9 years ago by Antonio R. Franco ★ 5.1k • written 7.9 years ago by rob.costa1234 ▴ 310

0

Entering edit mode

hi, You might want to look at BBDuk (from BBMap suite). Here is a helpful Seqanswers thread on its use cases - http://seqanswers.com/forums/showthread.php?t=42776

It uses k-mer based filtering to pull out possible contaminants.

ADD REPLY • link 7.9 years ago by Amitm ★ 2.2k

score 4 · Answer 1 · 2016-05-24

You have several alternatives

If you have any sort of evidence about the source of contamination, you can go straight by using BBSplit. Genomax2 has already indicated the information you need

If you don't have a clue about the origin of contamination, you have two more choices to discover that source. One is short (but need luck) and the other one is as long as mapping your reads to the human genome

One is use Kraken: Kraken use either pre-configured databases of sequences of a mixture of known organisms, or you can make your own. Then you use Kraken to figure out the source of contamination, and then, you can get rid of them by using BBSplit. You need, however, some luck to pin out the organism from which the contamination is coming. Kraken works very rapidly with the provided database, and is worth an attempt. But remember, once you discover the source of contamination, the only way to ge rid of reads is by mapping
A longer alternative is the using of blobology. But this is not as fast as Kraken, since it relies in downloading the whole nt database from NCBI and following an included script, it assemble your reads and do a mapping to discover the source of contamination.

That being said, it will be a lot better if you map your sequences to the human genome, and get rid of unmapped sequences

score 0 · Answer 2 · 2016-05-24

See these posts:

How to remove contamination from the transcriptome assembly

A: Assembly Of Unmapped Reads

A: Sequence Reads Unmapped To Human Genome

Contaminating Sequences And Genome Assembly

See these papers:

http://www.sciencedirect.com/science/article/pii/S0888754314001517

if you cannot reach it see the link below:

http://www.sciencedirect.com.sci-hub.cc/science/article/pii/S0888754314001517

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4239086/

score 0 · Answer 3 · 2016-05-24

In addition to @Amitm's answer BBMap contains BBSplit which is designed to bin reads based on reference genomes. In your case if you only care about human sequences then you can bin those away from the rest of the data.
You can sample a random set of reads from your data by using $ reformat.sh in=reads.fq out=sampled.fq sample=NNN from BBMap suite (replace NNN with number of reads you want to sample). Then use the sampled reads with BBSplit to test, if you don't want to process the entire file.