classify contaminants in bisulfite treated whole genome data
1
0
Entering edit mode
3.5 years ago
Richard ▴ 590

Hi folks,

I have some whole genome bisulfite sequence (WGBS) data where the alignment rates are about 75% to the human genome + known controls that are included. I have trimmed out adapter dimers, as well as clipped adapter sequence from the end of the reads, but the alignment rate remains just above 75%. I've tried pulling out the unaligned reads, assembling them and then pulling out the contigs with the highest representation. However, when I blast these sequences I get either no hit or only a piece of my query/contig matches some reference genome.

So my question is, how do people identify contaminants in bisulfite treated genome data?

I understand that I could create a kraken reference of bisulfite treated genome sequence data for a large number of reference genomes, but I'm hoping there is something a little more accessible.

wgbs • 685 views
ADD COMMENT
0
Entering edit mode
3.5 years ago

That is a very normal mapping efficiency value for WGBS. After all, you're mapping to a reduced alphabet (3 bases instead of 4) and you've treated the DNA rather harshly. The fact that nothing comes up in your BLAST query supports the notion that the 25% reads that couldn't be mapped are most likely not representative of contamination, but artefacts of the experiment.

ADD COMMENT

Login before adding your answer.

Traffic: 2416 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6