Question: ATAC-seq annotation to gene names
0
gravatar for RHI
7 weeks ago by
RHI0
RHI0 wrote:

Hello all,

I’m doing ATAC seq differential peak analysis. I only have a file with the peaks counts from 2 groups with 3 replicates each, and the peak loci in a bed file. I run the differential analysis using the peak counts file in DESeq2 and have a list of DE peaks. I would like to know how to annotate those regions to gene names using the peak loci in the bed file.

From the peak count file I have a column with peak IDs in the format 1:181674-181967; while in the bed file I have 3 columns: chr number, start and end. How can I merge those files and identify the gene name annotations?

Thank you so much!

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by RHI0

Thank you so much. Is there a way I can do the same using R? Is there any package/function would you recommend me? Thanks!

ADD REPLYlink written 7 weeks ago by RHI0

The inTad package, linked in my answer, is an R package.

I'm sure you can do the other two methods using GenomicRanges, but it not really an approach i'm familier with.

ADD REPLYlink written 7 weeks ago by i.sudbery11k
1
gravatar for i.sudbery
7 weeks ago by
i.sudbery11k
Sheffield, UK
i.sudbery11k wrote:

There are a couple of ways you can do this. None of them are perfect.

The three most obvious choices are: * Pair peak with nearest gene * Pair peak with all genes within a certain distance * Pair peak with genes where accessibility correlates with expression.

For the first two, you might consider all genes, or you might consider only differentially expressed genes (if you have paired gene expression information).

In our last ATACseq paper we used a combination of all DE peaks within 1mb of a DE gene and correlation between peaks and genes that are in the same TAD.

I will assume you have a bed file or GTF file with [all or DE] gene transcription start sites.

For nearest, use bedtools closest

bedtools closest -a de_peaks.bed -b gene_tss.bed > gene_assignments.bed

For all genes with (say) 100kb, we take the TSSs and extend them 100kb in either direction. We then overlap this with the differential peaks:

bedtools slop -b 100000 -i gene_tss.bed
    | sort -k1,1 -k2,2n
    | bedtools intersect -a de_peaks.bed -b - >  gene_assignments.bed

To pair up peaks with genes whose expression they correlate with, use the inTad bioconductor package.

ADD COMMENTlink written 7 weeks ago by i.sudbery11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1656 users visited in the last hour
_