Preparing ChIP-seq data for analysis with DESeq2
2
0
Entering edit mode
5 months ago
gkunz ▴ 10

I am trying to find some resources that discuss how to best prepare ChIP-seq data to be processed in DESeq2. I am having trouble determining how to best go about preparing a count matrix to be used for generating a DESeqDataSet. I have .bam files from and corresponding .narrowPeak files, but I am not sure how to appropriately generate a count matrix from these data or something that can serve as input for DESeq2. I have done some googling and reading, but have been unable to find a clear explanation.

If you are aware or could share any good tutorials out there about how to go about setting up ChIP-seq data for DESeq2 analysis that would be great!

Any assistance is appreciated!

Thanks!

DESeq2 ChIP-seq • 543 views
ADD COMMENT
0
Entering edit mode

Harvard-Chan bioinformatics core has ChIP-seq data analysis tutorials. Look under lessons for detailed training materials.

ADD REPLY
0
Entering edit mode

A fantastic resource for sure, but I don't see anywhere in their lessons where they address this question. Could you point out where they do so?

They utilize diffbind for the identification of differential peaks. I am not looking to utilize diffbind.

ADD REPLY
0
Entering edit mode
5 months ago
ATpoint 55k

What you need to make a count matrix is a set of reference regions, this could be the merge of all called peaks. For an extensive discussion I recommend to read the extensive manuals of both the Bioconductor packages csaw (which suggests a window-based approach) and DiffBind. There are many threads on making a count matrix from a reference peak set e.g. Best practice for analysing ATAC-seq data

By the way, it is good practice to indicate crossposts, e.g. the one over at Bioconductor which is a forum for technical help with the Bioc packages rather than a platform for general advise.

ADD COMMENT
0
Entering edit mode

Thanks for the response!

I visited the post you have linked and with go about attempting this method!

I have read the the csaw and DiffBind package vignettes in fair detail and run differential analysis utilizing both. As far as I am aware (and more than happy to be wrong) neither explicitly require the data to be formatted in this manner. Unless the data generate by the dba.count function could be passed directly into DESeq2? if that is the case maybe I will try that as well. Peak-based and window-based analyses have yielded extremely different results when I have used then to analyze my data set. The hope is to utilize DESeq2 alone, independent of the DiffBind wrapper to maybe add some clarity to the outputs.

Is there an appropriate was to go about indicating cross-posts, like simply including a link? I am more than happy to do so in the future. Additionally, is it inappropriate to post a broader question like this to the DiffBind forum?

Again, thanks for the input!

ADD REPLY
0
Entering edit mode
11 weeks ago
Rory Stark ★ 1.2k

In DiffBind, you can generate a consensus peak set and count matrix using the dba.count() function. If you want to retrieve the raw counts, you should set score=DBA_SCORE_READS, and then retrieve the matrix using dba.peakset()with bRetrieve=TRUE.

If you run a full DiffBind analysis, you can retrieve a well-formed DESeq2 object by calling dba.analyze() with bRetrieveAnalysis=TRUE (the default method is DESeq2).

ADD COMMENT

Login before adding your answer.

Traffic: 2093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6