Using ATACseq CPM for clustering samples
0
0
Entering edit mode
3.5 years ago
barati • 0

Hi everyone, I'm a newbie in the field, so I have a few super naive questions. I have a set of bulk ATACseq samples and I want to do clustering on them.

  1. Is using log CPM ok for clustering samples? (do I need to do quantile normalization? is there a better way to cluster samples based on ATAC signal? If so, what is it?)
  2. For CPM, I'm wondering if the set of R commands I used are ok:

(The way that I approached it was to call peaks with MACS2, generate a consensus peakset, get raw counts per peak, get log cpm (prior count 5))


    dba_object = dba(sampleSheet= <sampleSheet.csv>) # construct dba object
###### note: the sampleSheet.csv has the associated bamReads which I assume are going to be used for getting counts?
    consensus_peakset = <my consensus peak set as GRanges object>
    dba_object = dba.count(dba_object, peaks=consensus_peakset, score=DBA_SCORE_READS) # raw counts
    counts = dba.peakset(dba_object, bRetrieve=TRUE, writeFile = <rawcounts_filename>)

Are the above couple of lines correct for getting raw reads??? I'm getting some values that are below 1 and I thought you'd only have integer values for how many reads fall in a given peak...

later on for CPM:


    raw_counts_df = read.table(<rawcounts_filename>)[,c(1:3)] # removes seqname, start, & end to get a read counts matrix
    cpm_log_counts = cpm(raw_counts_df, log=TRUE, prior.count=5) # gets log cpm with prior count 

I'll be grateful for any advice people might have on how I can go about clustering ATACseq data and whether my approach is in any way an ok way to do it

ATAC-seq normalization clustering • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6