non human contaminants in sequencing
3
0
Entering edit mode
7.9 years ago
rob.costa1234 ▴ 310

I want to detect any nonhuman contaminants in my sequencing data (RNA/ DNA) Is there a quick tool which can provide a kind of estimate without actually aligning to the genome. I think DeconSeq is not working.

Thanks

sequencing • 2.6k views
ADD COMMENT
0
Entering edit mode

hi, You might want to look at BBDuk (from BBMap suite). Here is a helpful Seqanswers thread on its use cases - http://seqanswers.com/forums/showthread.php?t=42776

It uses k-mer based filtering to pull out possible contaminants.

ADD REPLY
4
Entering edit mode
7.9 years ago

You have several alternatives

  1. If you have any sort of evidence about the source of contamination, you can go straight by using BBSplit. Genomax2 has already indicated the information you need

If you don't have a clue about the origin of contamination, you have two more choices to discover that source. One is short (but need luck) and the other one is as long as mapping your reads to the human genome

  1. One is use Kraken: Kraken use either pre-configured databases of sequences of a mixture of known organisms, or you can make your own. Then you use Kraken to figure out the source of contamination, and then, you can get rid of them by using BBSplit. You need, however, some luck to pin out the organism from which the contamination is coming. Kraken works very rapidly with the provided database, and is worth an attempt. But remember, once you discover the source of contamination, the only way to ge rid of reads is by mapping

  2. A longer alternative is the using of blobology. But this is not as fast as Kraken, since it relies in downloading the whole nt database from NCBI and following an included script, it assemble your reads and do a mapping to discover the source of contamination.

That being said, it will be a lot better if you map your sequences to the human genome, and get rid of unmapped sequences

ADD COMMENT
0
Entering edit mode
7.9 years ago
GenoMax 141k

In addition to @Amitm's answer BBMap contains BBSplit which is designed to bin reads based on reference genomes. In your case if you only care about human sequences then you can bin those away from the rest of the data.
You can sample a random set of reads from your data by using $ reformat.sh in=reads.fq out=sampled.fq sample=NNN from BBMap suite (replace NNN with number of reads you want to sample). Then use the sampled reads with BBSplit to test, if you don't want to process the entire file.

ADD COMMENT

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6