Convert Granges list of peaks into consensus counts
2
1
Entering edit mode
3.5 years ago
Bioinfo_2006 ▴ 160

After ATAC-Seq analysis, I have a list of peaks

Now, I would like to merge our open regions from all samples into a set of non-redundant (no overlapping regions) open regions present in any sample (see example below).

I tried the following using soGGi

  myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)
names(myPeaks) <- c("HindBrain_1", "HindBrain_2", "Kidney_1", "Kidney_2", "Liver_1",
"Liver_2")
Group <- factor(c("HindBrain", "HindBrain", "Kidney", "Kidney", "Liver", "Liver"))
consensusToCount <- soGGi:::runConsensusRegions(GRangesList(myPeaks), "none")


But I always get a NULL when I do this. What would be the alternate to soGGI to unlist the peaks to get a consensus count of non-redundant peaks and add that as a metadata like in example above? Thanks.

GRanges R Bioconductor • 4.4k views
0
Entering edit mode

Please do not use screenshots of plain text, they're counter-productive. Instead, copy the text to a GitHub gist and link the gist here.

0
Entering edit mode

Hello, I run into the same issue as yours. Did you solve it?

0
Entering edit mode

Did you try my suggestion below?

0
Entering edit mode

I didn't understand your code. Do you mind explaining it ?

GenomicRanges::reduce(do.call(c, your.GRangesList))


What does c mean? and What am I doing here? Also for bedrolls, what do k1,1 -k2,2n mean? Thank you. I had 'soGGi:::runConsensusRegions' function worked before. But when I used the same code for the same samples again, it didn't work.

0
Entering edit mode

I updated my answer to explain the GRanges command. For the Unix sort it says that the file shall be sorted by first column by name and second column in a numerical fashion. This is basic Unix, I suggest you invest time to get a better background, as Unix built-in tools are very powerful for data manipulation, imho essential to bioinformatics.

0
Entering edit mode

Ok. Thank you very much.

6
Entering edit mode
3.4 years ago
Bioinfo_2006 ▴ 160

Hope someone finds this useful when they come across the ATAC-Seq tutorial

 peaks <- dir("ATAC_Data/ATAC_Peaks_forCounting/", pattern = "*.narrowPeak",
full.names = TRUE)
myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)
myGRangesList<-GRangesList(myPeaks)
reduced <- reduce(unlist(myGRangesList))
consensusIDs <- paste0("consensus_", seq(1, length(reduced)))
mcols(reduced) <- do.call(cbind, lapply(myGRangesList, function(x) (reduced %over% x) + 0))
reducedConsensus <- reduced
mcols(reducedConsensus) <- cbind(as.data.frame(mcols(reducedConsensus)), consensusIDs)
consensusIDs <- paste0("consensus_", seq(1, length(reducedConsensus)))
return(reducedConsensus)

0
Entering edit mode

Hi,

I cam across the same problem when following the ATAC-seq tutorial from Rockefeller University (https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html) and trying to run the same function, namely soGGi:::runConsensusRegions. I also always get NULL. So I tried using your solution/function, but am getting the following error:

I am sorry if this is something very trivial/basic, I am very new to bioinformatics and really don't know how to solve this issue...I would appreciate help :)

0
Entering edit mode

I edited the above now. Try it again. Once you read in all the peaks in your folder, use lapply to get the Granges which is stored as myPeaks. Then convert them to a list (myGRangesList).

0
Entering edit mode

Hi there, I had the same problem, too. Thanks to your suggestion I managed to solve it. However, the RU ATAC workshop also considered blacklist regions in the merge analysis.

The original script was as follows:

Since I am new with R and data mining, could you please help me to integrate your script with the original one, considering blacklist regions?

1
Entering edit mode

You can just append the code from the RU tutorial to the solution above, i.e. once you have the consensus list from above (stored in the variable reducedConsensus) you can do the following:

consensusToCount <- reducedConsensus # for staying consistent with the naming of the variables in the RU tutorial consensusToCount <- consensusToCount[!consensusToCount %over% blklist & !seqnames(consensusToCount) %in% "chrM"]

0
Entering edit mode

Thank You!!! It was simpler than I expected. I was confusing "consensusIDs" with "reducedConsensus". Just another little question... so what does consensusIDs in Bioinfo_2006's script stand for?

0
Entering edit mode

Thanks a lot for your help and effort!

0
Entering edit mode

I just noticed something: When converting the imported peak regions to a GRanges object using

myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)

the first line, i.e. peak, in each file does not get imported. Why is that? Or am I missing something?

0
Entering edit mode
Error in getListElement(x, i, ...) :
GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment


I got this error again even though using your solution only I resolved the issue. How to fix this?

0
Entering edit mode

Hi,

Can you please explain a little bit the purpose of this line mcols(reduced) <- do.call(cbind, lapply(myGRangesList, function(x) (reduced %over% x) + 0))?

Thanks a lot

0
Entering edit mode

This answer help me to fix my problem. Thank you very much.

0
Entering edit mode

Can someone explain this to me? I don't understand what the original problem is and how this solves it. I tried plugging this in to my code for this tutorial and I got the following:

Error: no function to return from, jumping to top level

2
Entering edit mode
3.5 years ago
ATpoint 68k

If I get you correctly, try:

GenomicRanges::reduce(do.call(c, your.GRangesList))


Edit: c stands for concatenate and will combine the elements of the GRangesList (so single GRanges object) into a single GRanges object. Given there are overlaps reduce will then merge them into non-redundant intervals.

Outside of R, having peaks as BED files one would do:

cat *.bed | sort k1,1 -k2,2n | bedtools merge -i - > consensus.bed

0
Entering edit mode

Should I use the GRangesList here? I got an error.

Error in (function (classes, fdef, mtable)  : unable to find an inherited method for function 'reduce' for signature '"list"'

0
Entering edit mode

This is rather unproductive, I have no idea what your input data are. Please show what kind of data you have, also which type. Use class() on your data to get an idea in which format they are.

0
Entering edit mode

The initial input is a large list and I named it as 'myPeaks'. I used GRangesList(myPeaks) to change the list into a "CompressedGRangesList", as shown below.

myGRangesList<-GRangesList(myPeaks)
GenomicRanges::reduce(do.call(c, myGRangesList))
class(myGRangesList)
[1] "CompressedGRangesList"
attr(,"package")
[1] "GenomicRanges"

0
Entering edit mode

I found out the problem with your code. After you run do.call(c, your.GRangesList), you convert GRangesList into a list again. That's why I run into an error. To make your code work, you will have to GenomicRanges::reduce(GRangesList(do.call(c, your.GRangesList))).

0
Entering edit mode

Right. I guess a simple unlist() on the GRL would've done the trick right away, thanks for following up :)

reduce(unlist(your.GRangesList))


should do it?

0
Entering edit mode

Yes. It does work after I corrected this. But the new problem is that I didn't get any metadata.

>myGRangesList
GRangesList object of length 3
......


# myGRangesList shows that there are three different datasets.

>consensusToCount <- GenomicRanges::reduce(unlist(myGRangesList))
>consensusToCount
GRanges object with 82483 ranges and 0 metadata columns:
seqnames            ranges strand
<Rle>         <IRanges>  <Rle>
[1]     chr1       10548-10887      *
[2]     chr1       12861-13145      *
[3]     chr1       17363-17554      *
[4]     chr1       28861-29950      *
[5]     chr1       34562-34728      *
...      ...               ...    ...