CollectRnaSeqMetrics (Picard) output to convert FeatureCounts into TPM
Entering edit mode
9 months ago
camillab. ▴ 160


I have bulkRNAseq dates (12 samples, pair end sequenced) and my pipine was :

  • I performed quality control with FastqQC,
  • Trimmed reads with Trimmomatic
  • Aligned reads to the reference genome with STAR
  • Used Samtools to sort and index the BAM files
  • Calculated counts with FeatureCounts

Now, I want to cover the counts into TPM as follow:

counts_to_tpm <- function(counts, featureLength, meanFragmentLength)


  • counts that is my merge file with the hit counts from all samples
  • featureLength A numeric vector with feature lengths which it's present in my BAM file
  • meanFragmentLength is the mean fragment lengths

Is it correct to calculate this parameter can be calculated with CollectRnaSeqMetrics (Picard) or with picardmetrics ? and do I have to run it for every samples of my dataset? or given they have been sequenced together one sample is enough? I guess the mean length for x gene should be the same regardless the samples - or am I wrong and I didn't`t get the role of Picard?

I did try to run it for one sample but I am confused which is the parameter that I have to use in code above for the meanFragmentLength to get the TPM. I got a txt file which looks like this:

enter image description here

Apologies if it is again a stupid question!

Thank you for the help!


R Picard CollectRnaSeqMetrics • 811 views
Entering edit mode

Why not use something like RSEM to get counts and TPM?

Entering edit mode

but I will have to align again the sample right? I struggled (a lot) to find a computer with enough RAM to run STAR and I I don't want to go back to the original fastq if possible.

Entering edit mode

Gotcha. Next time though, use RSEM as it internally uses STAR (as one option, it can use other aligners as well) anyway. Also, you're going the alignment route with STAR but if RAM is your primary concern, you should go the pseudoalignment route.

I'm assuming you already looked at this answer: Calculating TPM from featureCounts output - that states that you can use the picardmetrics wrapper. I'm trying to figure out where you can get the mean fragment length.

Entering edit mode

I have calculate from this tutorial. so basically it retrieve the normalized counts matrix using Deseq2 . Do you think it is correct?

Entering edit mode

What is your end goal? TPM is only valid for comparing expression in the same sample or a very narrow use case where intra-sample comparison can be done without caveats, so unless you're clear in your end goal, calculating TPM might not be a detour worth taking.


Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6