Question

Using ATACseq CPM for clustering samples

0

Entering edit mode

3.5 years ago

barati • 0

Hi everyone, I'm a newbie in the field, so I have a few super naive questions. I have a set of bulk ATACseq samples and I want to do clustering on them.

Is using log CPM ok for clustering samples? (do I need to do quantile normalization? is there a better way to cluster samples based on ATAC signal? If so, what is it?)
For CPM, I'm wondering if the set of R commands I used are ok:

(The way that I approached it was to call peaks with MACS2, generate a consensus peakset, get raw counts per peak, get log cpm (prior count 5))


    dba_object = dba(sampleSheet= <sampleSheet.csv>) # construct dba object
###### note: the sampleSheet.csv has the associated bamReads which I assume are going to be used for getting counts?
    consensus_peakset = <my consensus peak set as GRanges object>
    dba_object = dba.count(dba_object, peaks=consensus_peakset, score=DBA_SCORE_READS) # raw counts
    counts = dba.peakset(dba_object, bRetrieve=TRUE, writeFile = <rawcounts_filename>)

Are the above couple of lines correct for getting raw reads??? I'm getting some values that are below 1 and I thought you'd only have integer values for how many reads fall in a given peak...

later on for CPM:


    raw_counts_df = read.table(<rawcounts_filename>)[,c(1:3)] # removes seqname, start, & end to get a read counts matrix
    cpm_log_counts = cpm(raw_counts_df, log=TRUE, prior.count=5) # gets log cpm with prior count

I'll be grateful for any advice people might have on how I can go about clustering ATACseq data and whether my approach is in any way an ok way to do it

ATAC-seq normalization clustering • 1.0k views

ADD COMMENT • link 3.5 years ago by barati • 0