Question

ChIP-Seq for viral genomes

1

Entering edit mode

5.5 years ago

divya.nandakumar ▴ 30

I am trying to analyze ChIP-Seq data from a viral transcription factor. I am interested in identifying peaks in the viral genome specifically (~150 kb). I am having a hard time calling peaks from programs like MACS2 or Homer using default settings and just changing the genome size.

When I look at the alignment files in a genome browser such as IGV, I am able to see distinct peaks in the IP and not in the input control or a control where the transcription factor is not tagged. I am not sure if I need to play around with some settings or if there are specific ChIP analysis programs that cater to small genomes such as viral genomes.

While I don't get any peaks with MACS2, I do get peaks when I run HOMER and turn off all filtering or have a low fold change between control and the IP. Is it acceptable to manually curate a ChIP-Seq data? That is doable given the small genome size. Any help would be much appreciated!

ChIP-Seq next-gen • 1.5k views

ADD COMMENT • link 5.5 years ago by divya.nandakumar ▴ 30

0

Entering edit mode

What is the nature of that sample? An organism where the virus is integrated into the genome?

ADD REPLY • link 5.5 years ago by ATpoint 81k

0

Entering edit mode

The viral genome exists as episomes in the cell. It is not integrated in to the genome.

ADD REPLY • link 5.4 years ago by divya.nandakumar ▴ 30

0

Entering edit mode

Not really answering your question, but I'd be concerned with cross contamination to the host (human?) genome. I would definitely do the short experiment of adding your virus genome to the human genome and then aligning all reads and redoing the MACS analysis.

I bet there are a few false positive peaks just because of the relative difference in genome sizes. Chip-seq always leads to peaks in my experience, I am really not overly convinced by the method.

ADD REPLY • link 5.4 years ago by colindaven 6.3k

0

Entering edit mode

I align to both the viral and host genome to avoid any mis alignments.. I then use only the viral aligned reads for peak analysis. Using the host+viral leads to a bunch of false positives from the host which look nothing like peaks on the genome browser and almost nothing from the virus which is where I actually see good peaks. From my understanding of peak calling algorithms, almost all of them with their default parameters are optimized for large genomes with widely spaced genes unlike a viral genome which is both small and dense.

I think the viral peaks are real because they are very distinct on the genome browser and could be validated by ChIP-qPCR as well. There is also consistency with transcript levels of targets etc.

ADD REPLY • link 5.4 years ago by divya.nandakumar ▴ 30

0

Entering edit mode

I think your analysis is correct, and the tools are optimised for large genomes. Never had to do small genome chip-seq analysis, thankfully.

I think careful manual curation is appropriate in this case.

One thing that has really helped me in Chip-seq analysis is using the Mulitbamsummary tools then the plotFingerprint tool in Deeptools, available on the Freiburg Galaxy server among others. It's really good at assessing the chip strength.

Devon Ryan has written some good powerpoints on using deeptools for this purpose.

ADD REPLY • link 5.4 years ago by colindaven 6.3k

0

Entering edit mode

Thanks! I will check them out. I was also able to get some suggestions from the makers of HOMER to adjust the parameters for small genomes.

ADD REPLY • link 5.4 years ago by divya.nandakumar ▴ 30