Question

Rename samples for plotting using plotPCA

0

Entering edit mode

4.2 years ago

c_u ▴ 520

Hi,

I am using the plotPCA function of DESeq2 to see how my samples look overall and maybe find any bad samples. I first provide the BAM files to featurecounts and then import those counts to DESeq2 for further analysis. However, since I load the original sample bam files, their names are super long, and so when I run plotPCA, the plot has these long messy names. Here is where I load the BAMs -

counts <- featureCounts(nthreads=3, isGTFAnnotationFile=TRUE, annot.ext="/Volumes/bam/DRG/annotations/Homo_sapiens.GRCh38.95.gtf", files=c('40T4L.fastqAligned.sortedByCoord.out.bam', '41T7R.fastqAligned.sortedByCoord.out.bam', '42T7L.fastqAligned.sortedByCoord.out.bam', '42T7R.fastqAligned.sortedByCoord.out.bam', '44T10R.fastqAligned.sortedByCoord.out.bam', '44T11L.fastqAligned.sortedByCoord.out.bam', '44T11R.fastqAligned.sortedByCoord.out.bam', '45T10L.fastqAligned.sortedByCoord.out.bam', '45T11R.fastqAligned.sortedByCoord.out.bam', '46T8L.fastqAligned.sortedByCoord.out.bam', '46T8R.fastqAligned.sortedByCoord.out.bam', '47T7L.fastqAligned.sortedByCoord.out.bam', '47T7R.fastqAligned.sortedByCoord.out.bam'))$counts

And later I run plotPCA with label names (because I want to be able to see individual samples on the plot) thus -

plotPCA(vsd, ntop=1000) + geom_text(aes(label=name),vjust=2,check_overlap = TRUE,size = 4)

The resultant plot looks like this - https://imgur.com/bcdu4J6 . As you can see, the long file name ruins the plot. Is there a way to rename the samples at some point (instead of having to rename the original BAM files) so that the final plot has shorter sample names?

My whole code is here -

counts <- featureCounts(nthreads=3, isGTFAnnotationFile=TRUE, annot.ext="/Volumes/bam/DRG/annotations/Homo_sapiens.GRCh38.95.gtf", files=c('40T4L.fastqAligned.sortedByCoord.out.bam', '41T7R.fastqAligned.sortedByCoord.out.bam', '42T7L.fastqAligned.sortedByCoord.out.bam', '42T7R.fastqAligned.sortedByCoord.out.bam', '44T10R.fastqAligned.sortedByCoord.out.bam', '44T11L.fastqAligned.sortedByCoord.out.bam', '44T11R.fastqAligned.sortedByCoord.out.bam', '45T10L.fastqAligned.sortedByCoord.out.bam', '45T11R.fastqAligned.sortedByCoord.out.bam', '46T8L.fastqAligned.sortedByCoord.out.bam', '46T8R.fastqAligned.sortedByCoord.out.bam', '47T7L.fastqAligned.sortedByCoord.out.bam', '47T7R.fastqAligned.sortedByCoord.out.bam'))$counts
sampleTable <- data.frame(condition = factor(c('P', 'P', 'P', 'P', 'NP', 'NP', 'NP', 'NP', 'P', 'P', 'P', 'P', 'P')))
coldata <- sampleTable
deseqdata <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~condition)
dds <- DESeq(deseqdata)
vsd <- vst(dds)
plotPCA(vsd, ntop=1000) + geom_text(aes(label=name),vjust=2,check_overlap = TRUE,size = 4)

RNA-Seq DESeq2 • 3.5k views

ADD COMMENT • link 4.2 years ago by c_u ▴ 520

score 0 · Accepted Answer · 2020-02-11

0

Entering edit mode

4.2 years ago

c_u ▴ 520

I found the answer!

I simply used the substr function at the end while specifying the labels, like so -

plotPCA(vsd, ntop=1000) + geom_text(aes(**label=substr(name, start = 1, stop = 6)**),vjust=2,check_overlap = TRUE,size = 4)

ADD COMMENT • link 4.2 years ago by c_u ▴ 520