Question

How to remove contamination from NGS data

1

Entering edit mode

7.9 years ago

olp123 ▴ 20

Hello,

I got 2 DNA samples from bacteria of the same species sequenced using Illumina platform. Unluckily, one sample was contaminated with bacteria from the different genus bacillus. There is a clearly different peak in GC content. How can I remove the sequences which are due to contamination?

I tried to identify the contaminating seuqences by aligning my sample contigs with Bacillus contigs from database using the Mauve software. I can clearly identify large parts of the contamination but there are unaligned contigs in the end of each sequence which I do not know where they belong to. Here is the Mauve screenshot. Dark green regions are the target sequences.Light green the contamination from Bacillus. I dont know what to do with the red parts. http://s20.postimg.org/5jre11ubf/bacillus_2_3.jpg

Does anyone know how to solve the problem without sequencing again and loosing as little information as possible?

Thanks a lot.

next-gen • 4.4k views

ADD COMMENT • link updated 7.9 years ago by Mo ▴ 920 • written 7.9 years ago by olp123 ▴ 20

score 5 · Answer 1 · 2016-05-27

5

Entering edit mode

7.9 years ago

GenoMax 141k

Use BBsplit from BBMap. Provide the two (or one correct) genomes to bin the reads. You may lose some reads that will not map uniquely but that can't be helped. You could choose include them in both bins.

ADD COMMENT • link 7.9 years ago by GenoMax 141k

score 4 · Answer 2 · 2016-05-27

4

Entering edit mode

7.9 years ago

harold.smith.tarheel ★ 4.9k

I can recommend BBSplit: a description can be found here.

[Edit] Ninja'ed by GenoMax!

ADD COMMENT • link 7.9 years ago by harold.smith.tarheel ★ 4.9k

score 0 · Answer 3 · 2016-05-28

0

Entering edit mode

7.9 years ago

Mo ▴ 920

have a look at this package in R, it showed promising results http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3843372/

ADD COMMENT • link 7.9 years ago by Mo ▴ 920