Question: ChIPseeker how to read multiple peak files (Compare Multiple peak file)
0
gravatar for tintinfinfin123
3.0 years ago by
tintinfinfin1230 wrote:

Hi,

My question is more naive. I am using ChIPseeker to compare multiple Peak files. How to read them all and store in "files" object.

files <- getSampleFiles()

Let say I have two peak files

/myFolder/peak1.bed

/myFolder/peak2.bed

How to read and calculate tagMatrix for my multiple ChIP-seq files.. I have sam, bam, bed format aligned files as well. How can I come to tagMatrixList from the files that I have to come to this point:

tagMatrixList <- lapply(files, getTagMatrix, windows=promoter)

Could you please show me the beginning codes/steps.

many thanks

thanks

chipseeker chip-seq • 3.7k views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by tintinfinfin1230

Do these peak files come from different samples (cell lines or subjects) or are they from different TF or histone marks?

Have you tried looking at the ChIPseeker vignette here - it is quite informative

If you give more clarification on the two samples, I am happy to help with starter code

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by apnri40

@ apnri; Many Thanks. They are two different ChIPs, one is histone another is TF. Well, my question is actually one and very simple one and more naive one: "how to read my own peak file from hard drive" (not the package peak file) to follow the vignette example analysis. The vignette is enough detailed. Just got confused in one place where it says that "tagMatrix" is precomputed (to save time). I though it is may be reading aligned BED file (in addition to or) instead of peak bed file some where else and making any Tag Matrix. So, may be this is not the case, all is coming from just peak bed file. So, the started code is needed. So, would you show how to "getTagMatrix" for all the Peaks files I have? That is tagMatrixList <- getTagMatrix(peak, windows=promoter) # but for all peaks. I am little naive in R. peak <- readPeakFile(files) # does not work for all peak files at a time. Thanks.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by tintinfinfin1230

@ apnri: Same cell Line but different TF. So, the started code would be helpful if you please provide So, would you please show how to "getTagMatrix" for all the Peaks files I have? That is tagMatrixList <- getTagMatrix(peak, windows=promoter) # but for all peaks. I am little naive in R. peak <- readPeakFile(files) # does not work for all peak files at a time. Thanks.

ADD REPLYlink written 3.0 years ago by tintinfinfin1230

I am not sure I follow what the final goal is when you say tagMatrix for all peaks. You can extend the promoter region farther to get the Tag binding profiles to larger regions. What is it that you want to view with these files?

Here a starter for ChIPseeker -- most of it from the vignette linked above.

##load packages and get annotations
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

histone.fn <- "histone.bed" # histone peak file name
tf.fn <- "tf.bed" # TF peak file name
histone <- readPeakFile(histone.fn)
tf <- readPeakFile(tf.fn)

# create tagmatrix
promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
histone.tagMatrix <- getTagMatrix(histone, windows=promoter)
tf.tagMatrix <- getTagMatrix(tf, windows=promoter)
ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by apnri40

You should also check this out if that is helpful -- https://github.com/shenlab-sinai/ngsplot

ADD REPLYlink written 3.0 years ago by apnri40

Many Thanks. That is also interesting.

ADD REPLYlink written 3.0 years ago by tintinfinfin1230

@ apnri:

Thanks. I think I was not clear enough. What I mean is to "Compare Multiple peak file" is the actual goal. (7 ChIP peak data set comparison)

So, for that one has to read all files AT A TIME, not one by one. One by one I can do easily. But how altogether. Like one already shown by Guangchuang Yu:

files <- list(peak1 = "/myFolder/peak1.bed", peak2 = "/myFolder/peak2.bed")

Now, how do you make a single "tagMatrix" object for multiple TF peak file? For example: how to come to following points when you have two TF peak files (peak1.bed, peak2.bed) :

tagMatrixList <- lapply(files, getTagMatrix, windows=promoter)

plotAvgProf(tagMatrixList, xlim=c(-3000, 3000))

In the vignette it said the to do the above, you can load the precomputed system file:

data("tagMatrixList")

I don't want to load system example files. I need my files to be red in "

tagMatrixList <- lapply(files, getTagMatrix, windows=promoter)"

But how do you do that with will all peak TF files that you have (peak1.bed, peak2.bed) at a time?

Thanks again.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by tintinfinfin1230
files <- list(peak1 = "/myFolder/peak1.bed", peak2 = "/myFolder/peak2.bed")

## this is your tagMatrixList, calculated from your input files (peak1.bed and peak2.bed)
tagMatrixList <- lapply(files, getTagMatrix, windows=promoter)

I can't see any problem here.

ADD REPLYlink written 3.0 years ago by Guangchuang Yu2.2k

Hi, that works. Thanks

ADD REPLYlink written 3.0 years ago by tintinfinfin1230
2
gravatar for Guangchuang Yu
3.0 years ago by
Guangchuang Yu2.2k
China/Guangzhou/Southern Medical University
Guangchuang Yu2.2k wrote:

If you print out the files, you will get:

> files <- getSampleFiles()
> files
$ARmo_0M
[1] "/Library/R/library/ChIPseeker/extdata/GEO_sample_data/GSM1174480_ARmo_0M_peaks.bed.gz"

$ARmo_1nM
[1] "/Library/R/library/ChIPseeker/extdata/GEO_sample_data/GSM1174481_ARmo_1nM_peaks.bed.gz"

$ARmo_100nM
[1] "/Library/R/library/ChIPseeker/extdata/GEO_sample_data/GSM1174482_ARmo_100nM_peaks.bed.gz"

$CBX6_BF
[1] "/Library/R/library/ChIPseeker/extdata/GEO_sample_data/GSM1295076_CBX6_BF_ChipSeq_mergedReps_peaks.bed.gz"

$CBX7_BF
[1] "/Library/R/library/ChIPseeker/extdata/GEO_sample_data/GSM1295077_CBX7_BF_ChipSeq_mergedReps_peaks.bed.gz"

It's a named list of the input files.

So in your case, just create a named list of your files:

files <- list(peak1 = "/myFolder/peak1.bed", peak2 = "/myFolder/peak2.bed")

For the second question, there is a getTagMatrix function with example code presented in the vignette.

Please go through the vignette carefully before posting your question.

ADD COMMENTlink written 3.0 years ago by Guangchuang Yu2.2k

Many Many Thanks for being so helpful. Well, my question is actually one and very simple one and more naive one (sorry I don't know it): "how to read/list all of my own peak files from hard drive and apply the example codes on all those peaks at a time" (not the package peak files) to follow the vignette example analysis. The vignette is enough detailed. Just got confused in one place where it says that "tagMatrix" is precomputed (to save time). I thought it may be reading aligned BED file (in addition to or) instead of peak bed file some where else and making any Tag Matrix or something. Many Thanks once again. So, now with your code for "ffiles <- list(peak1 = "/myFolder/peak1.bed", peak2 = "/myFolder/peak2.bed")" How to do following: tagMatrixList <- lapply(files, getTagMatrix, windows=promoter)"

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by tintinfinfin1230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1625 users visited in the last hour