Question: ATAC-seq : strategies to have the list of the expressed genes
0
gravatar for anais1396
2.7 years ago by
anais139630
Brussels
anais139630 wrote:

Hello everyone !!

I'm new in the analyse of NGS data and I have ATAC-seq datas. I would like to know what is the strategies used to obtain a list with the expressed genes from the .bed files that we have after the peak calling ?

I performed peak calling with macs2 and I obtained bed files (narrowPeak) which contain the list of positions in genome where the chromatin is open. I would like to know how to obtain (from these bed files) the list of the genes which are in these regions because these should be the genes that will be expressed in my cells as the chromatin is open so we can have potentially TF binding in these regions. Then, I have to compare 2 conditions (healthy people and patient) so I have to compare 2 lists of genes that I will obtain. Is there a tool to do that ?

Thank you in advance for your help !!

Anaïs

sequencing atac-seq • 2.3k views
ADD COMMENTlink modified 17 months ago by bm1160 • written 2.7 years ago by anais139630

Can you tell us which genome assembly you aligned your NGS data to?

ADD REPLYlink written 2.7 years ago by James Ashmore3.0k

I used the human genome, hg19 version

ADD REPLYlink written 2.7 years ago by anais139630

I tried to load required packages but it doesn't work...

I have the following error message :"installation path not writeable, unable to update packages: MASS, survival" for the loading of bioclite() and then I have this "installation of package ‘TxDb.Hsapiens.UCSC.hg19.knownGene’ had non-zero exit status"

I would try to do this with R but I'm not very familiar with it...

Little help please ???

Thanks in advance ;-)

ADD REPLYlink written 2.6 years ago by anais139630
1

Follow these directions to set a local storage path (adjust file paths as needed for your local environment) for your R packages.

Note: Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by GenoMax95k

Hi

I would like to do the exact same analysis with my ATAC-Seq data from zebrafish. I used DanRer10 for alignment. Can you help me with this?

Thanks

B

ADD REPLYlink written 17 months ago by bm1160

Please do not use the answer field for questions. What is unclear, there is an accepted answer?

ADD REPLYlink written 17 months ago by ATpoint44k
  1. I have to use the zebrafish package instead of the one below: pkgNames <- c("rtracklayer", "org.Hs.eg.db", "TxDb.Hsapiens.UCSC.hg19.knownGene")

My ATAC Seq libraries are aligned to danRer 10

  1. What bioconductor package is available to find promoters for zebrafish like the one for human below: There is a hg38 package for TxDb (http://bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg38.knownGene.html )

Thanks

ADD REPLYlink written 16 months ago by bm1160

How about searching the BIoC packages: https://bioconductor.org/packages/3.9/data/annotation/ Sure you will find the one for for species.

ADD REPLYlink written 16 months ago by ATpoint44k
3
gravatar for James Ashmore
2.7 years ago by
James Ashmore3.0k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore3.0k wrote:

If you have R and Bioconductor installed, you can try the following steps:

# Load required packages
pkgNames <- c("rtracklayer", "org.Hs.eg.db", "TxDb.Hsapiens.UCSC.hg19.knownGene")
libSetup <- lapply(pkgNames, library, character.only = TRUE)

# Import narrowPeak file
fileName <- "Replace this string with the name of your peak file"
narrowPeak <- c(signalValue = "numeric", pValue = "numeric", qValue = "numeric", peak = "integer")
peakRanges <- import(fileName, format = "BED", extraCols = narrowPeak)

# Extract promoter ranges
promRanges <- promoters(genes(TxDb.Hsapiens.UCSC.hg19.knownGene))

# Map gene symbol names
promRanges$symbol <- mapIds(org.Hs.eg.db, promRanges$gene_id, "SYMBOL", "ENTREZID")

# Find overlapping ranges
rangeOverlaps <- findOverlaps(peakRanges, promRanges)
peakRanges$symbol <- promRanges$symbol[subjectHits(rangeOverlaps)]

The peakRanges object will then have a column with the gene names of the promoters your peaks overlap. If you are not familiar with R or don't have them installed, you can go to the UCSC Table Browser and download a BED file of the promoter regions. Then use a tool such as Bedtools or Bedops to intersect your narrowPeak file with the BED file:

bedtools intersect -a Peaks.narrowPeak -b Promoters.bed
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by James Ashmore3.0k

Thank you very much for your answer, it is very helpful !!

Where do you have the files with the promoters in R? I will also need them with the hg38 version of the human genome Or maybe just in switching the hg19 into hg38 in the name of the file in R ??

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by anais139630
1

There is a hg38 package for TxDb (http://bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg38.knownGene.html ). Data for promoters is coming from this package.

ADD REPLYlink written 2.7 years ago by GenoMax95k

Thank you !! It's very helpful !!

ADD REPLYlink written 2.6 years ago by anais139630
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2471 users visited in the last hour
_