Question: Convert Granges list of peaks into consensus counts
1
gravatar for Bioinfo_2006
5 months ago by
Bioinfo_2006120
Bioinfo_2006120 wrote:

After ATAC-Seq analysis, I have a list of peaks

Granges list of peaks

Now, I would like to merge our open regions from all samples into a set of non-redundant (no overlapping regions) open regions present in any sample (see example below).

Identify non-reundant peaks

I tried the following using soGGi

  myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)
 names(myPeaks) <- c("HindBrain_1", "HindBrain_2", "Kidney_1", "Kidney_2", "Liver_1", 
 "Liver_2")
 Group <- factor(c("HindBrain", "HindBrain", "Kidney", "Kidney", "Liver", "Liver"))
 consensusToCount <- soGGi:::runConsensusRegions(GRangesList(myPeaks), "none")

But I always get a NULL when I do this. What would be the alternate to soGGI to unlist the peaks to get a consensus count of non-redundant peaks and add that as a metadata like in example above? Thanks.

granges bioconductor R • 464 views
ADD COMMENTlink modified 4 months ago • written 5 months ago by Bioinfo_2006120

Please do not use screenshots of plain text, they're counter-productive. Instead, copy the text to a GitHub gist and link the gist here.

ADD REPLYlink written 5 months ago by RamRS25k

Hello, I run into the same issue as yours. Did you solve it?

ADD REPLYlink written 4 months ago by sophialovechan40

Did you try my suggestion below?

ADD REPLYlink written 4 months ago by ATpoint28k

I didn't understand your code. Do you mind explaining it ?

GenomicRanges::reduce(do.call(c, your.GRangesList))

What does c mean? and What am I doing here? Also for bedrolls, what do k1,1 -k2,2n mean? Thank you. I had 'soGGi:::runConsensusRegions' function worked before. But when I used the same code for the same samples again, it didn't work.

ADD REPLYlink written 4 months ago by sophialovechan40

I updated my answer to explain the GRanges command. For the Unix sort it says that the file shall be sorted by first column by name and second column in a numerical fashion. This is basic Unix, I suggest you invest time to get a better background, as Unix built-in tools are very powerful for data manipulation, imho essential to bioinformatics.

ADD REPLYlink written 4 months ago by ATpoint28k

Ok. Thank you very much.

ADD REPLYlink written 4 months ago by sophialovechan40
2
gravatar for ATpoint
5 months ago by
ATpoint28k
Germany
ATpoint28k wrote:

If I get you correctly, try:

GenomicRanges::reduce(do.call(c, your.GRangesList))

Edit: c stands for concatenate and will combine the elements of the GRangesList (so single GRanges object) into a single GRanges object. Given there are overlaps reduce will then merge them into non-redundant intervals.

Outside of R, having peaks as BED files one would do:

cat *.bed | sort k1,1 -k2,2n | bedtools merge -i - > consensus.bed
ADD COMMENTlink modified 4 months ago • written 5 months ago by ATpoint28k

Should I use the GRangesList here? I got an error.

Error in (function (classes, fdef, mtable)  : unable to find an inherited method for function 'reduce' for signature '"list"'
ADD REPLYlink modified 4 months ago • written 4 months ago by sophialovechan40

This is rather unproductive, I have no idea what your input data are. Please show what kind of data you have, also which type. Use class() on your data to get an idea in which format they are.

ADD REPLYlink written 4 months ago by ATpoint28k

The initial input is a large list and I named it as 'myPeaks'. I used GRangesList(myPeaks) to change the list into a "CompressedGRangesList", as shown below.

myGRangesList<-GRangesList(myPeaks)
GenomicRanges::reduce(do.call(c, myGRangesList))
class(myGRangesList)
[1] "CompressedGRangesList"
attr(,"package")
[1] "GenomicRanges"
ADD REPLYlink modified 4 months ago • written 4 months ago by sophialovechan40

I found out the problem with your code. After you run do.call(c, your.GRangesList), you convert GRangesList into a list again. That's why I run into an error. To make your code work, you will have to GenomicRanges::reduce(GRangesList(do.call(c, your.GRangesList))).

ADD REPLYlink written 4 months ago by sophialovechan40

Right. I guess a simple unlist() on the GRL would've done the trick right away, thanks for following up :)

reduce(unlist(your.GRangesList))

should do it?

ADD REPLYlink written 4 months ago by ATpoint28k

Yes. It does work after I corrected this. But the new problem is that I didn't get any metadata.

>myGRangesList
GRangesList object of length 3
......

myGRangesList shows that there are three different datasets.

>consensusToCount <- GenomicRanges::reduce(unlist(myGRangesList))
>consensusToCount
GRanges object with 82483 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]     chr1       10548-10887      *
  [2]     chr1       12861-13145      *
  [3]     chr1       17363-17554      *
  [4]     chr1       28861-29950      *
  [5]     chr1       34562-34728      *
  ...      ...               ...    ...

but no metadata columns

I don't know where went wrong. Do you have an idea about this problem?

ADD REPLYlink written 4 months ago by sophialovechan40
2
gravatar for Bioinfo_2006
4 months ago by
Bioinfo_2006120
Bioinfo_2006120 wrote:

Hope someone finds this useful when they come across the ATAC-Seq tutorial

 peaks <- dir("ATAC_Data/ATAC_Peaks_forCounting/", pattern = "*.narrowPeak", 
full.names = TRUE)
myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)
myGRangesList<-GRangesList(myPeaks)   
reduced <- reduce(unlist(myGRangesList))
consensusIDs <- paste0("consensus_", seq(1, length(reduced)))
mcols(reduced) <- do.call(cbind, lapply(myGRangesList, function(x) (reduced %over% x) + 0))
reducedConsensus <- reduced
mcols(reducedConsensus) <- cbind(as.data.frame(mcols(reducedConsensus)), consensusIDs)
consensusIDs <- paste0("consensus_", seq(1, length(reducedConsensus)))
return(reducedConsensus)
ADD COMMENTlink modified 3 months ago • written 4 months ago by Bioinfo_2006120

Hi,

I cam across the same problem when following the ATAC-seq tutorial from Rockefeller University (https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html) and trying to run the same function, namely soGGi:::runConsensusRegions. I also always get NULL. So I tried using your solution/function, but am getting the following error:

Error in unlist(myGRangesList) : object 'myGRangesList' not found

I am sorry if this is something very trivial/basic, I am very new to bioinformatics and really don't know how to solve this issue...I would appreciate help :)

ADD REPLYlink written 3 months ago by mickey_9510

I edited the above now. Try it again. Once you read in all the peaks in your folder, use lapply to get the Granges which is stored as myPeaks. Then convert them to a list (myGRangesList).

ADD REPLYlink written 3 months ago by Bioinfo_2006120

Hi there, I had the same problem, too. Thanks to your suggestion I managed to solve it. However, the RU ATAC workshop also considered blacklist regions in the merge analysis.

The original script was as follows:

Since I am new with R and data mining, could you please help me to integrate your script with the original one, considering blacklist regions?

Thank you in advance :)

ADD REPLYlink modified 3 months ago • written 3 months ago by Kimaguresilvia0

You can just append the code from the RU tutorial to the solution above, i.e. once you have the consensus list from above (stored in the variable reducedConsensus) you can do the following:

consensusToCount <- reducedConsensus # for staying consistent with the naming of the variables in the RU tutorial consensusToCount <- consensusToCount[!consensusToCount %over% blklist & !seqnames(consensusToCount) %in% "chrM"]

ADD REPLYlink written 3 months ago by mickey_9510

Thank You!!! It was simpler than I expected. I was confusing "consensusIDs" with "reducedConsensus". Just another little question... so what does consensusIDs in Bioinfo_2006's script stand for?

ADD REPLYlink written 3 months ago by Kimaguresilvia0

Thanks a lot for your help and effort!

ADD REPLYlink written 3 months ago by mickey_9510

I just noticed something: When converting the imported peak regions to a GRanges object using

myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)

the first line, i.e. peak, in each file does not get imported. Why is that? Or am I missing something?

ADD REPLYlink written 3 months ago by mickey_9510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1106 users visited in the last hour