My PCA plot is weird: PC1 with 98.684% variance and PC2 with 0.582% variance
0
0
Entering edit mode
9 weeks ago

Hi all, I'm an undergrad learning how to process some data given to me by my mentor. This is bulk RNA-seq data, aligned by STAR and assigned using featureCounts. I pre-filtered using rowSums(counts(dds))>=10 and ran DESeq and set an alpha=0.05. However, I got a weird-looking MA graph with diagonal lines. I'm not too sure why it looks like that. The MA plot was plotted using plotMA(). Even weirder is that my PCA plot has PC1 with 98.684% variance and PC2 with 0.582% variance. I'm quite confused and not sure how to go from here. I already ran vst(dds,blind=FALSE) before using the data in the plotPCA() function. My biggest question is, how likely is that PC1 is able to actually capture 98% of the variation and have this not be due to some technical artefact? What can I do to make sure this is an actual biological difference? And if my PCA plot is incorrect, what steps can I take to check what went wrong?

I'm still learning and always happy to learn more but I am quite new to bulk RNA-seq so I would appreciate any guidance on what I can do to troubleshoot this. I am more than happy to provide any details about the data. Thank you.

PCA plot made with plotPCAMA plot made with plotMA

r pca rna-seq visualization • 1.2k views
ADD COMMENT
0
Entering edit mode

It is very unlikely unless something bad (very bad) happened during samples preparation: extraction, library preparation and sequencing

ADD REPLY
0
Entering edit mode

Hi, sorry what do you mean by that? As my results are likely caused by something very bad happening during sample preparation?

ADD REPLY
0
Entering edit mode

Check the RNA quality (RIN); check if RNA is completely degraded in one group of samples compared to the other.

Ensure that the same library preparation and sequencing protocols have been used for all samples.

Explain the type of samples you are comparing. There could be a biological explanation for why PC1 in the PCA plot accounts for 98.684% of the variance, assuming all samples are independent biological replicates and not multiple sequencing runs of RNA extracted from just two samples (one per treatment).

ADD REPLY
1
Entering edit mode

One group is an immortalized cell line. The other is the same cell type, but derived from stem cells. I only have access to the fastq files. Is there any other way to check RNA quality?

ADD REPLY
0
Entering edit mode

Is there any other way to check RNA quality?

You should ask the person who prepared the RNA samples

One group is an immortalized cell line. The other is the same cell type, but derived from stem cells. I only have access to the fastq files. Is there any other way to check RNA quality?

That could explain the large variance you see in PC1. However, you should discuss this with your mentor, as I am not an expert in this specific field. I primarily work with RNA-seq on bacteria.

Perhaps this task was given to you simply as an exercise to learn how to process RNA-seq data.

ADD REPLY
0
Entering edit mode

According to the MA plot, maybe try set a higher threshold for filtering? i.e. your rowSums(counts(dds))>=10 part

ADD REPLY
0
Entering edit mode

Hmm, I tried 20, 30. Still had the same result.

ADD REPLY

Login before adding your answer.

Traffic: 4514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6