Entering edit mode
10 days ago
Irving
•
0
I performed a PCA with sequencing data. There are two treatments and two replicates for each treatment. I'm confused about how the samples are grouped, and especially about the percentages on the axes of the plot — what do the 98% and 1% mean?
How did you do this PCA and make these plots? The axes labels for variation make no sense (and thus I suspect they were swapped).
I used the following script:
BAM files
R1="DMSO_H1_noduplicates_sorted.bam" R2="DMSO_H2_noduplicates_sorted.bam" R4="TPL_H1_noduplicates_sorted.bam" R5="TPL_H2_noduplicates_sorted.bam"
Output file
OUTPUT_DMSOTPL="multiBAMsummary_DMSO+TPL_k562.npz"
Compute read coverage bins across the genome
multiBamSummary bins -b $R1 $R2 $R4 $R5 -o $OUTPUT_DMSOTPL
Perform PCA and generate plot
plotPCA -in $OUTPUT_DMSOTPL -o plotPCA-BAM_DMSO+TPL.png
You are trying to PCA these by what, exactly? Binned coverage? Is this really informative?
Assuming this is deepTools, I recommend reviewing the documentation and choosing one of the two indicated options:
I do not recommend exploring noise-prone assays like ChIP-seq in a "blind" matter like this, using just the BAM files as input. Do proper peak calling, then make a consensus peal list, e.g. by merging all peaks, then a count matrix using featureCounts, then a PCA in R. Everything else, for me, is just superficial and not robust. Lots of threads here on merging peaks and all that.