Best way to make a PCA plot integrating in-house data with public data?
0
0
Entering edit mode
20 months ago
Trivas ★ 1.7k

Hi everyone, I'm hoping to integrate public RNA-seq datasets with some of my own RNA-seq datasets. However, the relevant public datasets have pretty different sequencing depths and read lengths than others. For instance, my datasets are ~50m reads/sample, 2x150bp while others are ~30m reads/sample, 2x100 and others are 60m reads/sample, 2x76bp.

I downloaded the raw fastq files and processed them all similarly (trimgalore -> STAR -> salmon -> tximport into DESeq2) and using the raw counts in downstream analyses. However, I'm starting to realize that that's not going to work and I need to incorporate some sort of normalization. My end goal is to be able to compare all of these samples on a single PCA plot.

What's the best way of doing this? I have 15 samples with either 2 or 3 replicates split across 5 DESeq objects.

  1. Use DESeq2 to normalize read counts based on sizeFactors and make a large dataframe of normalized counts prior to vst
  2. Use avgLength in my DESeq object and calculate a rough TPM. Then make a large dataframe of TPM counts. My only caveat is I include the following before exporting my DESeq object.

    keep <- rowSums(counts(ddsTxi)) >= 1
    ddsTxi <- ddsTxi[keep,]
    

    (Might not matter since it's only removing genes with samples that contain 0 genes.)

  3. Regenerate count matrices using tximport but include countsFromAbundance = "scaledTPM" in the tximport() call. I'm not really sure if scaledTPM is a better option than lengthScaledTPM for what I want to do, so I will gladly take some guidance there as well.

  4. Any other options?

Kind of kicking myself, but any of these options should be fairly quick to incorporate.

tximport SRA RNA-seq plot PCA • 567 views
ADD COMMENT
0
Entering edit mode

In the very first step to making the dataset comparable, you need to remove batch effects. You may use the ComBat approach from sva R package to do so.

ADD REPLY

Login before adding your answer.

Traffic: 2339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6