Interpret plots from DiffBind
1
0
Entering edit mode
10 months ago
Chris ▴ 260

Hi all, I have 4 samples which are 4 conditions but when I made some plots, I got only 2 conditions.enter image description here.

Does this plot mean ws has chromatin more chromatin open (gain) than wf sample? When I rerun the code, I got all 4 conditions but the plot is quite different. There are no gain and loss but only sites. Thank you so much! enter image description here

DiffBind • 2.6k views
ADD COMMENT
1
Entering edit mode

We need actual code for how these were generated to be of any help.

ADD REPLY
0
Entering edit mode

The code I used to make that plots:

dba.plotProfile(profiles)

I ran the code again today and got the plot with 4 samples so I think it is more important to interpret the plot than figure out why the first plots has only 2 samples.

ADD REPLY
1
Entering edit mode

Yeah, that doesn't tell us anything about what you've actually done to your data.

This demo indicates that the first is from a DiffBind object that has gone through analysis, while the second is from an unanalyzed object.

ADD REPLY
0
Entering edit mode

I see the analysis and no analysis plot use the same code but two different plot which I confused:

dba.plotProfile(profiles)

I guess the reason of missing 2 samples is because the minMembers.

   result <- dba.contrast(result, minMembers=2)

I ran the same code that DiffBind provided:

samples <- read.csv("path/to/sample_sheet_DiffBind.csv")
result <- dba(sampleSheet=samples)
result <- dba.blacklist(result) 
result <- dba.count(result, bParallel=FALSE)
result <- dba.normalize(result)
result <- dba.contrast(result, minMembers=2)
result <- dba.analyze(result)
ADD REPLY
2
Entering edit mode
10 months ago
Rory Stark ★ 2.0k

As jared.andrews07 pointed out, the top plot of post-analysis while the bottom plot is on unanalyzed data.

The bottom (unanalyzed) plot shows four samples and plots 1,000 sites selected at random. They show similar signals except for the wf sample which has very low signal.

The top plot (analyzed) shows the two sample groups, ws and wf. without seeing more details, I'm not sure what samples comprised each sample group as they should have at least two samples per group, however the second plot only shows one sample per group?

The top plot indicates that nearly all the changes in chromatin were in the direction of more open chromatin in the ws group compared to the wf group. Even the few Loss sites, where the chromatin is more open in the wf group, the signals appear very similar so the changes in this direction are more subtle (and, with only two samples, likely noise due to low power) while the Gain sites show a clear change from closed to open state.

ADD COMMENT
0
Entering edit mode

Thanks Rory! When I ran all the code and plot them (post-analysis), I still got only 2 samples so I don't know what wrong. Would you have a suggestion?

ADD REPLY
1
Entering edit mode

The two columns represent sample groups, not individual samples. They are made using the mean signal for each site across the replicates in the sample group. So if you have two ws replicates int he ws group, it is showing the mean signal across those two samples (and likewise combining the two samples in the wt samples group). This is controlled by the merge parameter, which is set to merge=DBA_REPLICATE by default.

ADD REPLY
0
Entering edit mode

I mean I didn't see mf and ms samples. Ws, wt, ms and mt are 4 different conditions but I only see 2 conditions.

ADD REPLY
1
Entering edit mode

There should be multiple contrasts in your analysis by default. This is showing the first one (ws vs wf). Each contrast will compute its own set of Gain/Loss sites. To see the other contrasts, specify sites=n where n is the contrast number you'd like to see.

To see what contrasts have been computed:

dba.show(result, bContrasts=TRUE)
ADD REPLY
0
Entering edit mode

Thanks Rory! So to have mf and ms I should run:

result <- dba.contrast(result, minMembers=2, sites=4)

Is that correct?

ATpoint thought that 100k differential peaks are likely wrong in the comment at the end of the post that why I am not sure. Would you please have a comment?
How to get genes associated with open chromatin regions?

ADD REPLY
1
Entering edit mode

No, sitesis a parameter to dba.plotProfile(). When you originally called dba.contrast(), it should set up all the pairwise contrasts between sample groups. You can see the contrasts as shown above; each has a contrast number. When you call dba.plotProfile(), you can specify a specific contrast to show (the default is the first contrast, which in your case is ws vs. wf). Using the defaults you'll only plot one contrast at a time (as the Gain/Loss sites differ for each contrast), so you would plot the mf vs ms contrast separately. You can add samples from other contrasts to the plot, showing the signal for those sample groups within the Gain/Loss sites for the original contrast, but this is more complicated; it is shown in the help page for dba.plotProfile() and more clearly in the demo script:

https://content.cruk.cam.ac.uk/bioinformatics/software/DiffBind/plotProfileDemo.html

ADD REPLY
0
Entering edit mode

Thank you so much for your detailed reply! Would you have a look at the comment from ATpoint above? His opinion is different from you.

ADD REPLY
1
Entering edit mode

I think the comment is on a different thread. I agree with ATpoint that it is unusual to have such a massive a change in chromatin state, and one should treat the result with scepticism. However it is not impossible to get this result so I can't say straight out it is incorrect.

ADD REPLY
0
Entering edit mode

Yes, the comment is in a different thread so I attached a link to it. Thanks Rory! I read many tutorials about ATAC seq but still didn't find one that talk about how to get a list of genes in an open chromatin regions. I check that by using IGV such as this one. Is this gene is more open in diseased than in control?enter image description here When I open xml fine on local IGV, I got thisenter image description here The value 2.3 or 1.51 are automatically have also 10 in the picture below. I got advice to adjust these value to max but still don't know how after looking functions on IGV.

ADD REPLY
1
Entering edit mode

Making inferences like that from tracks is rather dangerous, particularly if they aren't scaled properly.

I read many tutorials about ATAC seq but still didn't find one that talk about how to get a list of genes in an open chromatin regions.

Generally, you have a few ways to approach this. The first is to assign each ATAC peaks to gene(s), then look for those that change for a given gene. This is trickier than it seems at first glance, as distance-based assignment is not necessarily accurate, enhancers can regulate multiple genes, etc. You have to think carefully about how to do the assignment.

Another approach is to take a more gene-centric view, and look at peaks in promoter regions (2kb upstream, 1 kb downstream of TSS, or however you want to define them) for differential accessibility. This is simpler, but ignores those in enhancers. This can be tweaked to do the same throughout the gene body.

ADD REPLY
0
Entering edit mode

Hi Rory,

    dba.show(result_pro, bContrasts=TRUE)
     Factor Group Samples Group2 Samples2 DB.DESeq2
1 Condition    ws       2     wf        2     85319
2 Condition    ws       2     ms        2     18784
3 Condition    ws       2     mf        2     24422
4 Condition    wf       2     ms        2     83730
5 Condition    wf       2     mf        2     87644
6 Condition    mf       2     ms        2         0

Would you please tell me how should I modify dba.plotProfile(profiles) to get the comparison between ms mf? Sorry for my bad understanding after reading your explanation and the manual. As I understand, it should be dba.plotProfile(profiles, sites = ?) but I am not sure what to put in.

I tried to run something similar that used sites:

rep <- dba.report(tamoxifen)
rep <- rep[abs(rep$Fold) > 2,]
rep <- rep[order(rep$Fold, decreasing=TRUE),]
profiles <- dba.plotProfile(tamoxifen, sites=rep,
                            samples=list(Resistant=tamoxifen$mask$Resistant,
                                         Responsive=tamoxifen$mask$Responsive))

But I got

Error in h(simpleError(msg, call)) : error in evaluating the argument 'obj' in selecting a method for function 'unname': GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment. Would you please explain the value in DB.DESeq2 column mean?

Seem profiles <- dba.plotProfile(result, sites = 5) can plot wf and mf but I don't know why the value in row 6 is 0 which cause error when sities = 6.

ADD REPLY
0
Entering edit mode

Right, the issue that there are no sites below the FDR threshold for contrast 6. Addressed here: Help with DiffBind Generating report-based DBA object... Error: No valid contrasts/methods specified.

ADD REPLY

Login before adding your answer.

Traffic: 1363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6