Question: Convert Granges list of peaks into consensus counts
1
gravatar for Bioinfo_2006
16 months ago by
Bioinfo_2006140
Bioinfo_2006140 wrote:

After ATAC-Seq analysis, I have a list of peaks

Granges list of peaks

Now, I would like to merge our open regions from all samples into a set of non-redundant (no overlapping regions) open regions present in any sample (see example below).

Identify non-reundant peaks

I tried the following using soGGi

  myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)
 names(myPeaks) <- c("HindBrain_1", "HindBrain_2", "Kidney_1", "Kidney_2", "Liver_1", 
 "Liver_2")
 Group <- factor(c("HindBrain", "HindBrain", "Kidney", "Kidney", "Liver", "Liver"))
 consensusToCount <- soGGi:::runConsensusRegions(GRangesList(myPeaks), "none")

But I always get a NULL when I do this. What would be the alternate to soGGI to unlist the peaks to get a consensus count of non-redundant peaks and add that as a metadata like in example above? Thanks.

granges bioconductor R • 1.3k views
ADD COMMENTlink modified 14 months ago • written 16 months ago by Bioinfo_2006140

Please do not use screenshots of plain text, they're counter-productive. Instead, copy the text to a GitHub gist and link the gist here.

ADD REPLYlink written 16 months ago by _r_am31k

Hello, I run into the same issue as yours. Did you solve it?

ADD REPLYlink written 15 months ago by sophialovechan60

Did you try my suggestion below?

ADD REPLYlink written 15 months ago by ATpoint42k

I didn't understand your code. Do you mind explaining it ?

GenomicRanges::reduce(do.call(c, your.GRangesList))

What does c mean? and What am I doing here? Also for bedrolls, what do k1,1 -k2,2n mean? Thank you. I had 'soGGi:::runConsensusRegions' function worked before. But when I used the same code for the same samples again, it didn't work.

ADD REPLYlink written 15 months ago by sophialovechan60

I updated my answer to explain the GRanges command. For the Unix sort it says that the file shall be sorted by first column by name and second column in a numerical fashion. This is basic Unix, I suggest you invest time to get a better background, as Unix built-in tools are very powerful for data manipulation, imho essential to bioinformatics.

ADD REPLYlink written 15 months ago by ATpoint42k

Ok. Thank you very much.

ADD REPLYlink written 15 months ago by sophialovechan60
4
gravatar for Bioinfo_2006
14 months ago by
Bioinfo_2006140
Bioinfo_2006140 wrote:

Hope someone finds this useful when they come across the ATAC-Seq tutorial

 peaks <- dir("ATAC_Data/ATAC_Peaks_forCounting/", pattern = "*.narrowPeak", 
full.names = TRUE)
myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)
myGRangesList<-GRangesList(myPeaks)   
reduced <- reduce(unlist(myGRangesList))
consensusIDs <- paste0("consensus_", seq(1, length(reduced)))
mcols(reduced) <- do.call(cbind, lapply(myGRangesList, function(x) (reduced %over% x) + 0))
reducedConsensus <- reduced
mcols(reducedConsensus) <- cbind(as.data.frame(mcols(reducedConsensus)), consensusIDs)
consensusIDs <- paste0("consensus_", seq(1, length(reducedConsensus)))
return(reducedConsensus)
ADD COMMENTlink modified 14 months ago • written 14 months ago by Bioinfo_2006140

Hi,

I cam across the same problem when following the ATAC-seq tutorial from Rockefeller University (https://rockefelleruniversity.github.io/RU_ATAC_Workshop.html) and trying to run the same function, namely soGGi:::runConsensusRegions. I also always get NULL. So I tried using your solution/function, but am getting the following error:

Error in unlist(myGRangesList) : object 'myGRangesList' not found

I am sorry if this is something very trivial/basic, I am very new to bioinformatics and really don't know how to solve this issue...I would appreciate help :)

ADD REPLYlink written 14 months ago by mickey_9530

I edited the above now. Try it again. Once you read in all the peaks in your folder, use lapply to get the Granges which is stored as myPeaks. Then convert them to a list (myGRangesList).

ADD REPLYlink written 14 months ago by Bioinfo_2006140

Hi there, I had the same problem, too. Thanks to your suggestion I managed to solve it. However, the RU ATAC workshop also considered blacklist regions in the merge analysis.

The original script was as follows:

Since I am new with R and data mining, could you please help me to integrate your script with the original one, considering blacklist regions?

Thank you in advance :)

ADD REPLYlink modified 14 months ago • written 14 months ago by Kimaguresilvia0

You can just append the code from the RU tutorial to the solution above, i.e. once you have the consensus list from above (stored in the variable reducedConsensus) you can do the following:

consensusToCount <- reducedConsensus # for staying consistent with the naming of the variables in the RU tutorial consensusToCount <- consensusToCount[!consensusToCount %over% blklist & !seqnames(consensusToCount) %in% "chrM"]

ADD REPLYlink written 14 months ago by mickey_9530

Thank You!!! It was simpler than I expected. I was confusing "consensusIDs" with "reducedConsensus". Just another little question... so what does consensusIDs in Bioinfo_2006's script stand for?

ADD REPLYlink written 14 months ago by Kimaguresilvia0

Thanks a lot for your help and effort!

ADD REPLYlink written 14 months ago by mickey_9530

I just noticed something: When converting the imported peak regions to a GRanges object using

myPeaks <- lapply(peaks, ChIPQC:::GetGRanges, simple = TRUE)

the first line, i.e. peak, in each file does not get imported. Why is that? Or am I missing something?

ADD REPLYlink written 14 months ago by mickey_9530
2
gravatar for ATpoint
16 months ago by
ATpoint42k
Germany
ATpoint42k wrote:

If I get you correctly, try:

GenomicRanges::reduce(do.call(c, your.GRangesList))

Edit: c stands for concatenate and will combine the elements of the GRangesList (so single GRanges object) into a single GRanges object. Given there are overlaps reduce will then merge them into non-redundant intervals.

Outside of R, having peaks as BED files one would do:

cat *.bed | sort k1,1 -k2,2n | bedtools merge -i - > consensus.bed
ADD COMMENTlink modified 15 months ago • written 16 months ago by ATpoint42k

Should I use the GRangesList here? I got an error.

Error in (function (classes, fdef, mtable)  : unable to find an inherited method for function 'reduce' for signature '"list"'
ADD REPLYlink modified 15 months ago • written 15 months ago by sophialovechan60

This is rather unproductive, I have no idea what your input data are. Please show what kind of data you have, also which type. Use class() on your data to get an idea in which format they are.

ADD REPLYlink written 15 months ago by ATpoint42k

The initial input is a large list and I named it as 'myPeaks'. I used GRangesList(myPeaks) to change the list into a "CompressedGRangesList", as shown below.

myGRangesList<-GRangesList(myPeaks)
GenomicRanges::reduce(do.call(c, myGRangesList))
class(myGRangesList)
[1] "CompressedGRangesList"
attr(,"package")
[1] "GenomicRanges"
ADD REPLYlink modified 15 months ago • written 15 months ago by sophialovechan60

I found out the problem with your code. After you run do.call(c, your.GRangesList), you convert GRangesList into a list again. That's why I run into an error. To make your code work, you will have to GenomicRanges::reduce(GRangesList(do.call(c, your.GRangesList))).

ADD REPLYlink written 15 months ago by sophialovechan60

Right. I guess a simple unlist() on the GRL would've done the trick right away, thanks for following up :)

reduce(unlist(your.GRangesList))

should do it?

ADD REPLYlink written 15 months ago by ATpoint42k

Yes. It does work after I corrected this. But the new problem is that I didn't get any metadata.

>myGRangesList
GRangesList object of length 3
......

myGRangesList shows that there are three different datasets.

>consensusToCount <- GenomicRanges::reduce(unlist(myGRangesList))
>consensusToCount
GRanges object with 82483 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]     chr1       10548-10887      *
  [2]     chr1       12861-13145      *
  [3]     chr1       17363-17554      *
  [4]     chr1       28861-29950      *
  [5]     chr1       34562-34728      *
  ...      ...               ...    ...

but no metadata columns

I don't know where went wrong. Do you have an idea about this problem?

ADD REPLYlink written 15 months ago by sophialovechan60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1488 users visited in the last hour