PCA interpretation
0
0
Entering edit mode
10 days ago
Irving • 0

I performed a PCA with sequencing data. There are two treatments and two replicates for each treatment. I'm confused about how the samples are grouped, and especially about the percentages on the axes of the plot — what do the 98% and 1% mean? enter image description here

chip seq PCA • 539 views
ADD COMMENT
1
Entering edit mode

How did you do this PCA and make these plots? The axes labels for variation make no sense (and thus I suspect they were swapped).

ADD REPLY
0
Entering edit mode

I used the following script:

BAM files

R1="DMSO_H1_noduplicates_sorted.bam" R2="DMSO_H2_noduplicates_sorted.bam" R4="TPL_H1_noduplicates_sorted.bam" R5="TPL_H2_noduplicates_sorted.bam"

Output file

OUTPUT_DMSOTPL="multiBAMsummary_DMSO+TPL_k562.npz"

Compute read coverage bins across the genome

multiBamSummary bins -b $R1 $R2 $R4 $R5 -o $OUTPUT_DMSOTPL

Perform PCA and generate plot

plotPCA -in $OUTPUT_DMSOTPL -o plotPCA-BAM_DMSO+TPL.png

ADD REPLY
0
Entering edit mode

You are trying to PCA these by what, exactly? Binned coverage? Is this really informative?

ADD REPLY
1
Entering edit mode

Assuming this is deepTools, I recommend reviewing the documentation and choosing one of the two indicated options:

Named Arguments
--transpose
Perform the PCA on the transposed matrix, (i.e., on the matrix where rows are samples and columns are bins/features. This then matches what is typically done in R.

--rowCenter
When specified, each row (bin, gene, etc.) in the matrix is centered at 0 before the PCA is computed. This is useful only if you have a strong bin/gene/etc. correlation and the resulting principal component has samples stacked vertically. This option is not applicable if –transpose is specified.
ADD REPLY
0
Entering edit mode

I do not recommend exploring noise-prone assays like ChIP-seq in a "blind" matter like this, using just the BAM files as input. Do proper peak calling, then make a consensus peal list, e.g. by merging all peaks, then a count matrix using featureCounts, then a PCA in R. Everything else, for me, is just superficial and not robust. Lots of threads here on merging peaks and all that.

ADD REPLY

Login before adding your answer.

Traffic: 2336 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6