Question

ATAC-seq : strategies to have the list of the expressed genes

0

Entering edit mode

7.2 years ago

anais1396 ▴ 30

Hello everyone !!

I'm new in the analyse of NGS data and I have ATAC-seq datas. I would like to know what is the strategies used to obtain a list with the expressed genes from the .bed files that we have after the peak calling ?

I performed peak calling with macs2 and I obtained bed files (narrowPeak) which contain the list of positions in genome where the chromatin is open. I would like to know how to obtain (from these bed files) the list of the genes which are in these regions because these should be the genes that will be expressed in my cells as the chromatin is open so we can have potentially TF binding in these regions. Then, I have to compare 2 conditions (healthy people and patient) so I have to compare 2 lists of genes that I will obtain. Is there a tool to do that ?

Thank you in advance for your help !!

Anaïs

atac-seq sequencing • 4.6k views

ADD COMMENT • link updated 5.9 years ago by bm116 • 0 • written 7.2 years ago by anais1396 ▴ 30

0

Entering edit mode

Can you tell us which genome assembly you aligned your NGS data to?

ADD REPLY • link 7.2 years ago by James Ashmore ★ 3.5k

0

Entering edit mode

I used the human genome, hg19 version

ADD REPLY • link 7.2 years ago by anais1396 ▴ 30

0

Entering edit mode

I tried to load required packages but it doesn't work...

I have the following error message :"installation path not writeable, unable to update packages: MASS, survival" for the loading of bioclite() and then I have this "installation of package ‘TxDb.Hsapiens.UCSC.hg19.knownGene’ had non-zero exit status"

I would try to do this with R but I'm not very familiar with it...

Little help please ???

Thanks in advance ;-)

ADD REPLY • link 7.1 years ago by anais1396 ▴ 30

1

Entering edit mode

Follow these directions to set a local storage path (adjust file paths as needed for your local environment) for your R packages.

Note: Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY • link 7.1 years ago by GenoMax 152k

0

Entering edit mode

Hi

I would like to do the exact same analysis with my ATAC-Seq data from zebrafish. I used DanRer10 for alignment. Can you help me with this?

Thanks

B

ADD REPLY • link 5.9 years ago by bm116 • 0

0

Entering edit mode

Please do not use the answer field for questions. What is unclear, there is an accepted answer?

ADD REPLY • link 5.9 years ago by ATpoint 88k

0

Entering edit mode

I have to use the zebrafish package instead of the one below: pkgNames <- c("rtracklayer", "org.Hs.eg.db", "TxDb.Hsapiens.UCSC.hg19.knownGene")

My ATAC Seq libraries are aligned to danRer 10

What bioconductor package is available to find promoters for zebrafish like the one for human below: There is a hg38 package for TxDb (http://bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg38.knownGene.html )

Thanks

ADD REPLY • link 5.8 years ago by bm116 • 0

0

Entering edit mode

How about searching the BIoC packages: https://bioconductor.org/packages/3.9/data/annotation/ Sure you will find the one for for species.

ADD REPLY • link 5.8 years ago by ATpoint 88k

score 3 · Accepted Answer · 2018-05-07

3

Entering edit mode

7.2 years ago

James Ashmore ★ 3.5k

If you have R and Bioconductor installed, you can try the following steps:

# Load required packages
pkgNames <- c("rtracklayer", "org.Hs.eg.db", "TxDb.Hsapiens.UCSC.hg19.knownGene")
libSetup <- lapply(pkgNames, library, character.only = TRUE)

# Import narrowPeak file
fileName <- "Replace this string with the name of your peak file"
narrowPeak <- c(signalValue = "numeric", pValue = "numeric", qValue = "numeric", peak = "integer")
peakRanges <- import(fileName, format = "BED", extraCols = narrowPeak)

# Extract promoter ranges
promRanges <- promoters(genes(TxDb.Hsapiens.UCSC.hg19.knownGene))

# Map gene symbol names
promRanges$symbol <- mapIds(org.Hs.eg.db, promRanges$gene_id, "SYMBOL", "ENTREZID")

# Find overlapping ranges
rangeOverlaps <- findOverlaps(peakRanges, promRanges)
peakRanges$symbol <- promRanges$symbol[subjectHits(rangeOverlaps)]

The peakRanges object will then have a column with the gene names of the promoters your peaks overlap. If you are not familiar with R or don't have them installed, you can go to the UCSC Table Browser and download a BED file of the promoter regions. Then use a tool such as Bedtools or Bedops to intersect your narrowPeak file with the BED file:

bedtools intersect -a Peaks.narrowPeak -b Promoters.bed

ADD COMMENT • link 7.2 years ago by James Ashmore ★ 3.5k

0

Entering edit mode

Thank you very much for your answer, it is very helpful !!

Where do you have the files with the promoters in R? I will also need them with the hg38 version of the human genome Or maybe just in switching the hg19 into hg38 in the name of the file in R ??

ADD REPLY • link 7.2 years ago by anais1396 ▴ 30

1

Entering edit mode

There is a hg38 package for TxDb (http://bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg38.knownGene.html ). Data for promoters is coming from this package.