Rename samples for plotting using plotPCA
1
0
Entering edit mode
4.2 years ago
c_u ▴ 520

Hi,

I am using the plotPCA function of DESeq2 to see how my samples look overall and maybe find any bad samples. I first provide the BAM files to featurecounts and then import those counts to DESeq2 for further analysis. However, since I load the original sample bam files, their names are super long, and so when I run plotPCA, the plot has these long messy names. Here is where I load the BAMs -

counts <- featureCounts(nthreads=3, isGTFAnnotationFile=TRUE, annot.ext="/Volumes/bam/DRG/annotations/Homo_sapiens.GRCh38.95.gtf", files=c('40T4L.fastqAligned.sortedByCoord.out.bam', '41T7R.fastqAligned.sortedByCoord.out.bam', '42T7L.fastqAligned.sortedByCoord.out.bam', '42T7R.fastqAligned.sortedByCoord.out.bam', '44T10R.fastqAligned.sortedByCoord.out.bam', '44T11L.fastqAligned.sortedByCoord.out.bam', '44T11R.fastqAligned.sortedByCoord.out.bam', '45T10L.fastqAligned.sortedByCoord.out.bam', '45T11R.fastqAligned.sortedByCoord.out.bam', '46T8L.fastqAligned.sortedByCoord.out.bam', '46T8R.fastqAligned.sortedByCoord.out.bam', '47T7L.fastqAligned.sortedByCoord.out.bam', '47T7R.fastqAligned.sortedByCoord.out.bam'))$counts

And later I run plotPCA with label names (because I want to be able to see individual samples on the plot) thus -

plotPCA(vsd, ntop=1000) + geom_text(aes(label=name),vjust=2,check_overlap = TRUE,size = 4)

The resultant plot looks like this - https://imgur.com/bcdu4J6 . As you can see, the long file name ruins the plot. Is there a way to rename the samples at some point (instead of having to rename the original BAM files) so that the final plot has shorter sample names?

My whole code is here -

counts <- featureCounts(nthreads=3, isGTFAnnotationFile=TRUE, annot.ext="/Volumes/bam/DRG/annotations/Homo_sapiens.GRCh38.95.gtf", files=c('40T4L.fastqAligned.sortedByCoord.out.bam', '41T7R.fastqAligned.sortedByCoord.out.bam', '42T7L.fastqAligned.sortedByCoord.out.bam', '42T7R.fastqAligned.sortedByCoord.out.bam', '44T10R.fastqAligned.sortedByCoord.out.bam', '44T11L.fastqAligned.sortedByCoord.out.bam', '44T11R.fastqAligned.sortedByCoord.out.bam', '45T10L.fastqAligned.sortedByCoord.out.bam', '45T11R.fastqAligned.sortedByCoord.out.bam', '46T8L.fastqAligned.sortedByCoord.out.bam', '46T8R.fastqAligned.sortedByCoord.out.bam', '47T7L.fastqAligned.sortedByCoord.out.bam', '47T7R.fastqAligned.sortedByCoord.out.bam'))$counts
sampleTable <- data.frame(condition = factor(c('P', 'P', 'P', 'P', 'NP', 'NP', 'NP', 'NP', 'P', 'P', 'P', 'P', 'P')))
coldata <- sampleTable
deseqdata <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~condition)
dds <- DESeq(deseqdata)
vsd <- vst(dds)
plotPCA(vsd, ntop=1000) + geom_text(aes(label=name),vjust=2,check_overlap = TRUE,size = 4)
RNA-Seq DESeq2 • 3.5k views
ADD COMMENT
0
Entering edit mode
4.2 years ago
c_u ▴ 520

I found the answer!

I simply used the substr function at the end while specifying the labels, like so -

plotPCA(vsd, ntop=1000) + geom_text(aes(**label=substr(name, start = 1, stop = 6)**),vjust=2,check_overlap = TRUE,size = 4)
ADD COMMENT

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6