Question: Convert Granges list of peaks into consensus counts
1
gravatar for Bioinfo_2006
7 weeks ago by
Bioinfo_2006100
Bioinfo_2006100 wrote:

After ATAC-Seq analysis, I have a list of peaks

Granges list of peaks

Now, I would like to merge our open regions from all samples into a set of non-redundant (no overlapping regions) open regions present in any sample (see example below).

Identify non-reundant peaks

I tried the following using soGGi

  myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)
 names(myPeaks) <- c("HindBrain_1", "HindBrain_2", "Kidney_1", "Kidney_2", "Liver_1", 
 "Liver_2")
 Group <- factor(c("HindBrain", "HindBrain", "Kidney", "Kidney", "Liver", "Liver"))
 consensusToCount <- soGGi:::runConsensusRegions(GRangesList(myPeaks), "none")

But I always get a NULL when I do this. What would be the alternate to soGGI to unlist the peaks to get a consensus count of non-redundant peaks and add that as a metadata like in example above? Thanks.

granges bioconductor R • 200 views
ADD COMMENTlink modified 11 days ago • written 7 weeks ago by Bioinfo_2006100

Please do not use screenshots of plain text, they're counter-productive. Instead, copy the text to a GitHub gist and link the gist here.

ADD REPLYlink written 7 weeks ago by RamRS24k

Hello, I run into the same issue as yours. Did you solve it?

ADD REPLYlink written 27 days ago by sophialovechan40

Did you try my suggestion below?

ADD REPLYlink written 27 days ago by ATpoint23k

I didn't understand your code. Do you mind explaining it ?

GenomicRanges::reduce(do.call(c, your.GRangesList))

What does c mean? and What am I doing here? Also for bedrolls, what do k1,1 -k2,2n mean? Thank you. I had 'soGGi:::runConsensusRegions' function worked before. But when I used the same code for the same samples again, it didn't work.

ADD REPLYlink written 27 days ago by sophialovechan40

I updated my answer to explain the GRanges command. For the Unix sort it says that the file shall be sorted by first column by name and second column in a numerical fashion. This is basic Unix, I suggest you invest time to get a better background, as Unix built-in tools are very powerful for data manipulation, imho essential to bioinformatics.

ADD REPLYlink written 27 days ago by ATpoint23k

Ok. Thank you very much.

ADD REPLYlink written 26 days ago by sophialovechan40
2
gravatar for ATpoint
7 weeks ago by
ATpoint23k
Germany
ATpoint23k wrote:

If I get you correctly, try:

GenomicRanges::reduce(do.call(c, your.GRangesList))

Edit: c stands for concatenate and will combine the elements of the GRangesList (so single GRanges object) into a single GRanges object. Given there are overlaps reduce will then merge them into non-redundant intervals.

Outside of R, having peaks as BED files one would do:

cat *.bed | sort k1,1 -k2,2n | bedtools merge -i - > consensus.bed
ADD COMMENTlink modified 22 days ago • written 7 weeks ago by ATpoint23k

Should I use the GRangesList here? I got an error.

Error in (function (classes, fdef, mtable)  : unable to find an inherited method for function 'reduce' for signature '"list"'
ADD REPLYlink modified 26 days ago • written 26 days ago by sophialovechan40

This is rather unproductive, I have no idea what your input data are. Please show what kind of data you have, also which type. Use class() on your data to get an idea in which format they are.

ADD REPLYlink written 26 days ago by ATpoint23k

The initial input is a large list and I named it as 'myPeaks'. I used GRangesList(myPeaks) to change the list into a "CompressedGRangesList", as shown below.

myGRangesList<-GRangesList(myPeaks)
GenomicRanges::reduce(do.call(c, myGRangesList))
class(myGRangesList)
[1] "CompressedGRangesList"
attr(,"package")
[1] "GenomicRanges"
ADD REPLYlink modified 26 days ago • written 26 days ago by sophialovechan40

I found out the problem with your code. After you run do.call(c, your.GRangesList), you convert GRangesList into a list again. That's why I run into an error. To make your code work, you will have to GenomicRanges::reduce(GRangesList(do.call(c, your.GRangesList))).

ADD REPLYlink written 22 days ago by sophialovechan40

Right. I guess a simple unlist() on the GRL would've done the trick right away, thanks for following up :)

reduce(unlist(your.GRangesList))

should do it?

ADD REPLYlink written 22 days ago by ATpoint23k

Yes. It does work after I corrected this. But the new problem is that I didn't get any metadata.

>myGRangesList
GRangesList object of length 3
......

myGRangesList shows that there are three different datasets.

>consensusToCount <- GenomicRanges::reduce(unlist(myGRangesList))
>consensusToCount
GRanges object with 82483 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]     chr1       10548-10887      *
  [2]     chr1       12861-13145      *
  [3]     chr1       17363-17554      *
  [4]     chr1       28861-29950      *
  [5]     chr1       34562-34728      *
  ...      ...               ...    ...

but no metadata columns

I don't know where went wrong. Do you have an idea about this problem?

ADD REPLYlink written 22 days ago by sophialovechan40
0
gravatar for Bioinfo_2006
11 days ago by
Bioinfo_2006100
Bioinfo_2006100 wrote:

Hope someone finds this useful when they come across the ATAC-Seq tutorial

non_overlapping_region_counts<-function(x){
reduced <- reduce(unlist(myGRangesList))
consensusIDs <- paste0("consensus_", seq(1, length(reduced)))
mcols(reduced) <- do.call(cbind, lapply(myGRangesList, function(x) (reduced %over% x) + 0))
reducedConsensus <- reduced
mcols(reducedConsensus) <- cbind(as.data.frame(mcols(reducedConsensus)), consensusIDs)
consensusIDs <- paste0("consensus_", seq(1, length(reducedConsensus)))
return(reducedConsensus)
}

Where the input is of the class "CompressedGRangesList".

ADD COMMENTlink modified 11 days ago • written 11 days ago by Bioinfo_2006100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 927 users visited in the last hour