Question: ChIP-Seq contaminated by human tissue?
1
gravatar for Gary
4.1 years ago by
Gary450
Taiwan/Taichung/China Medical University Hospital
Gary450 wrote:

Hi,

I have a mouse H3K27ac ChIP-Seq data with totally 23,316,540 reads after trimming. Its 85.75% (19,994,088) reads can be aligned on mouse mm9 genome using Bowtie2. After that, I align unaligned reads (3,322,452) onto human hg19 reference genome. Among them, 34.14% (1,134,255) reads can be aligned on human hg19. Are these reads contaminated from human tissues? Many thanks. 

contamination chip-seq • 1.9k views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Gary450
1

Try aligning the 85% to human, you'll see a lot of homology between species. If it's nearly 34% I don't know. 

ADD REPLYlink written 4.1 years ago by karl.stamm3.5k
1

It is not 34.14% of the total reads, it is 34.14% from the 14.25% unmapped reads, or 4.86%.

edit: ok, I see what you mean, if a similar percentage of the mapped reads will also map to the human genome.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by h.mon26k
1

Yes, then we would know if that's normal. I believe a lot of mammal genes are shared. Exonically you could see 34%, but that sounds too high for a chip-seq. I really have no idea. But you can test how much is shared by aligning your known-mouse reads to human. That gives kind of a background rate of sequence similarity. 

ADD REPLYlink written 4.1 years ago by karl.stamm3.5k
1

You may want to run a few left over reads (that don't align to mouse) through blast.

If you feel they are truly contaminants then you could try BBSplit from BBMap package to separate them.

ADD REPLYlink written 4.1 years ago by genomax69k

H3K27Ac marks active regions, It is quite possible that it could share some fractions with human.

There are many more ultra conserved regions between human and mouse, H3K27Ac is also marked over gene bodies, so theoritically it should be over orthologous active genes.

you can run fastqc on raw files to check quality and over represented sequences.

but I think its fine

ADD REPLYlink written 4.1 years ago by Manvendra Singh2.1k
1
gravatar for Friederike
4.1 years ago by
Friederike4.5k
United States
Friederike4.5k wrote:

since the samples were probably handled and processed by humans, I wouldn't be too surprised about human DNA contamination. It's a recurring issue: see the NY Times (http://www.nytimes.com/2011/02/17/science/17genome.html) or this more recent publication: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0110808

I'd be more worried about those reads that can be mapped to mouse and human genome alike in that case, as those have the potential to bias your results. I don't have much experience with it, but there are a couple of tools that you can use to determine human contamination, just google it.

ADD COMMENTlink written 4.1 years ago by Friederike4.5k
0
gravatar for Gary
4.1 years ago by
Gary450
Taiwan/Taichung/China Medical University Hospital
Gary450 wrote:

Hi,

Thanks for all your valuable suggestion. Although 1,134,255 reads only occupy 4.86% of total reads (1,134,255 / 23,316,540) for this mouse H3K27ac ChIP-Seq sample, I still worry about the contamination issue very much. It is because that (1) we have known that a mouse H3K27me3 ChIP-Seq sample performed by the same labmate was contaminated by yeast. For totally 22,889,979 reads, only 29.46% (6,741,623) reads can be aligned on the mouse mm9 reference genome, and 31.66% (7,246,204) reads can be aligned on the yeast sacCer3 genome; (2) For another mouse H3K27me3 ChIP-Seq sample also performed by the same labmate, only 0.23% (40,699 / 17,733,214) unaligned reads can be aligned onto human hg19 genome. It means that this sample could be not contaminated by human tissue, and the first H3K27ac ChIP-Seq sample I reported could be, at least partially, contaminated by human tissue. Any additional suggestion is very welcome and thanks again.

ADD COMMENTlink written 4.1 years ago by Gary450
1

First, you should not add your comment as an answer.

Regarding your problems, instead of mapping only the unmapped reads to potential contaminants, map (a random subsample of) all your reads, as one of the initial quality-checking steps. You probably run FastQC on all your sequencing runs, add to this workflow MGA or FastQ_Screen - include common contaminants on their databases. If contamination is common, use some method to filter it, e.g. BBSplit as genomax2 suggested.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by h.mon26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1462 users visited in the last hour