Question

ATAC-seq annotation to gene names

0

Entering edit mode

3.3 years ago

RHI • 0

Hello all,

I’m doing ATAC seq differential peak analysis. I only have a file with the peaks counts from 2 groups with 3 replicates each, and the peak loci in a bed file. I run the differential analysis using the peak counts file in DESeq2 and have a list of DE peaks. I would like to know how to annotate those regions to gene names using the peak loci in the bed file.

From the peak count file I have a column with peak IDs in the format 1:181674-181967; while in the bed file I have 3 columns: chr number, start and end. How can I merge those files and identify the gene name annotations?

Thank you so much!

atac-sec gene annotation DE analysis • 3.1k views

ADD COMMENT • link 3.3 years ago by RHI • 0

0

Entering edit mode

Thank you so much. Is there a way I can do the same using R? Is there any package/function would you recommend me? Thanks!

ADD REPLY • link 3.3 years ago by RHI • 0

0

Entering edit mode

The inTad package, linked in my answer, is an R package.

I'm sure you can do the other two methods using GenomicRanges, but it not really an approach i'm familier with.

ADD REPLY • link 3.3 years ago by i.sudbery 19k

score 3 · Answer 1 · 2021-01-14

There are a couple of ways you can do this. None of them are perfect.

The three most obvious choices are: * Pair peak with nearest gene * Pair peak with all genes within a certain distance * Pair peak with genes where accessibility correlates with expression.

For the first two, you might consider all genes, or you might consider only differentially expressed genes (if you have paired gene expression information).

In our last ATACseq paper we used a combination of all DE peaks within 1mb of a DE gene and correlation between peaks and genes that are in the same TAD.

I will assume you have a bed file or GTF file with [all or DE] gene transcription start sites.

For nearest, use bedtools closest

bedtools closest -a de_peaks.bed -b gene_tss.bed > gene_assignments.bed

For all genes with (say) 100kb, we take the TSSs and extend them 100kb in either direction. We then overlap this with the differential peaks:

bedtools slop -b 100000 -i gene_tss.bed
    | sort -k1,1 -k2,2n
    | bedtools intersect -a de_peaks.bed -b - >  gene_assignments.bed

To pair up peaks with genes whose expression they correlate with, use the inTad bioconductor package.