Question

How to calculate TPM from featureCounts output

0

Entering edit mode

10 months ago

survive • 0

I would like to find the TPM counts for the GSE102073 study. When I downloaded the raw data from GEO, the raw data are featureCounts output.

First part of the file:

# Program:featureCounts v1.4.3-p1; Command:"/data/NYGC/Software/Subread/subread-1.4.3-p1-Linux-x86_64/bin/featureCounts" "-s" "2" "-a" "/data/NYGC/Resources/ENCODE/Gencode/gencode.v18.annotation.gtf" "-o" "/data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/featureCounts/Sample_JB4853_counts.txt" "/data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/STAR_alignment/Sample_JB4853_Aligned.out.WithReadGroup.sorted.bam"
Geneid  Chr     Start   End     Strand  Length  /data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/STAR_alignment/Sample_JB4853_Aligned.out.WithReadGroup.sorted.bam
ENSG00000223972.4       chr1;chr1;chr1;chr1     11869;12595;12975;13221 12227;12721;13052;14412 +;+;+;+ 1756    0
ENSG00000227232.4       chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1        14363;14970;15796;16607;16854;17233;17498;17602;17915;18268;24734;29321;29534   14829;15038;15947;16765;1705

How can I convert this into tpm counts?

I tried the method from this post but it requires a counts file which I don't have access to; or this post but I am confused on how to use tximport to get the tpm counts nor the input variable featureLength and meanFragmentLength.

Thank you.

rna-seq TPM featurecounts • 2.4k views

ADD COMMENT • link updated 10 months ago by rfran010 ▴ 900 • written 10 months ago by survive • 0

0

Entering edit mode

This file is your counts file, isn't it?

ADD REPLY • link 10 months ago by rfran010 ▴ 900

0

Entering edit mode

featureCounts file

ADD REPLY • link 10 months ago by survive • 0

0

Entering edit mode

Yes, I thought the featureCounts file is your counts file.

ADD REPLY • link 10 months ago by rfran010 ▴ 900

2

Entering edit mode

10 months ago

bioinfo_ga ▴ 70

hi , You can use a python package rnanorm [https://pypi.org/project/rnanorm/]. The input required are your read count values from feature counts along with the length of your genes/transcripts which can be fetched from reference gtf/gff file.

ADD COMMENT • link 10 months ago by bioinfo_ga ▴ 70

score 2 · Accepted Answer · 2023-06-04

2

Entering edit mode

10 months ago

rpolicastro 13k

For (accurate) TPMs you may want to consider processing your raw sequencing data through Salmon or Kallisto. These programs accurately estimate abundances at the transcript level which results in better TPM estimates. See the bioconductor RNA-seq guide for more info.

ADD COMMENT • link 10 months ago by rpolicastro 13k

0

Entering edit mode

hi, So i will need to download the raw .fastq file from SRA and run salmon/Kallisto, instead of using the featureCounts?