Question: Counting single-cell ATAC fragments at peaks on BED file
gravatar for Nico
5 weeks ago by
Osaka University
Nico0 wrote:

I'm currently trying to calculate the number of single-cell ATAC fragments that lie at peaks defined by a bed file.

I have a tsv file containing fragments with the following columns: chromosome, start, end, cell barcode. This file is around 10gb in size (around 250 million rows).

I also have another tsv file containing peaks with the following columns: chromosome, start, end.

What I want to do is: 1. Assign fragment to each peak on bed file 2. Output a data frame containing cell barcode, region where the fragments are assigned (chr, start, end), and also the number of fragments assigned to this region.

Here is what I tried, however R ran out of memory (I am running a desktop with 64gb memory and an i7 9700k). It also seems very slow. Maybe R is not suited for this kind of task, but currently I am most familiar with R.

In summary what I did was: loop over fragments and filter the bed file if the fragment lies in between the region of the bed file, and then append another column to the fragments so that I have the region identifier, and finally count the unique identifier for each cell.

fragments$region <- "none"

for(i in 1:nrow(fragments)){

chr_fragment <- fragments[i,] %>% pull(chr)
start_fragment <- fragments[i,] %>% pull(start)
end_fragment <- fragments[i,] %>% pull(end)

# region containing start of fragment 
extracted_region <- bed_file %>% filter(chr == chr_fragment) %>% filter(start <= start_fragment & start_fragment <= 
end) %>% paste(collapse = "_") 

extracted_region <- paste0("chr", extracted_region)

# append to initial fragments tsv file
fragments[i,5] <- extracted_region

paste0("finished assigning ", i, "out of ", nrow(fragments), " fragments") %>% print()

fragments <- rename(fragments, region = 5)

# remove fragments not in peak
fragments <- fragments %>% filter(region != "none")

# count number of fragments in region per cell
fragments <- fragments %>% mutate(dplyr::count(fragments, barcode, region))

Is it possible to manipulate this in a way that it won't take too much memory? Or is it impossible to achieve using R? Or if there is any tool that can do this (I tried to look but did not find any that suits this). It also seems to take awfully long time to perform this loop, which I don't even think will finish in weeks. Any help is greatly appreciated!

atac-seq R • 114 views
ADD COMMENTlink modified 5 weeks ago by geek_y11k • written 5 weeks ago by Nico0

Why can't you just use your bed file with your alignment files (BAMs) by using featureCounts software to count the number of fragments in each peak region?

ADD REPLYlink written 5 weeks ago by tiago2112871.2k

In scATAC, there is no bam file per cell. Its barcode based. FeatureCounts needs individual bam files per cell.

ADD REPLYlink written 5 weeks ago by geek_y11k
gravatar for geek_y
5 weeks ago by
geek_y11k wrote:

I am not sure if you have looked up for any tools for scATAC analysis. Signac, part of Seurat framework is developed for chromatin data set analysis, especially scATAC.

The FeatureMatrix function does exactly what you are looking for.

  cells = NULL,
  chunk = 50,
  sep = c("-", "-"),
  verbose = TRUE

fragments are your fragment file and features are your peaks as a GRanges object.

ADD COMMENTlink written 5 weeks ago by geek_y11k

Hi, thanks a lot, that is exactly what I was looking for. I actually used Signac before but was unaware of this function. I just did not know what and where to search for this.

ADD REPLYlink written 5 weeks ago by Nico0

If an answer was helpful to you, please indicate that by upvoting it

ADD REPLYlink written 5 weeks ago by Friederike6.0k

Accept the answer so that it wont bump in the future as unanswered.

ADD REPLYlink written 5 weeks ago by geek_y11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 674 users visited in the last hour