How to convert featureCounts to FPKM?
3
0
Entering edit mode
4.5 years ago
Biologist ▴ 280

I have a dataframe with featureCounts like following:

Geneid         sample1  sample2 sample3 sample4
ENSG00000241860.6   16  65  19  56
ENSG00000237491.8   57  53  36  89
ENSG00000177757.2   9   6   4   5
ENSG00000228794.8   1145    528 418 355
ENSG00000225880.5   38  21  37  80
ENSG00000230368.2   9   10  6   8
ENSG00000272438.1   36  8   1   26
ENSG00000230699.2   70  24  22  35
ENSG00000241180.1   0   0   1   2
ENSG00000223764.2   272 35  32  61
ENSG00000272512.1   6   16  239 46
ENSG00000224969.1   0   4   7   14
ENSG00000242590.1   669 401 2673    1425
ENSG00000273443.1   122 8   16  6
ENSG00000223823.1   11  0   5   7
ENSG00000272141.1   42  48  47  279


I see that in GDC portal TCGA htseq-counts are converted to FPKM with this formulae HT-seq FPKM. I wanted to convert the above featureCounts to FPKM. How to do that?

PS: FPKM is not for Differential analysis

Thank you.

RNA-Seq featurecounts fpkm r • 9.7k views
3
Entering edit mode
3.9 years ago
Ahmed Alhendi ▴ 210

Try countToFPKM package. This package provides an easy to use function to convert the read count matrix into FPKM values normalised by library size and feature effective length. Implements the following equation:

$enter image description here$.

The fpkm() function requires three inputs to return FPKM as numeric matrix normalized by library size and feature length:

• counts A numeric matrix of raw feature counts.
• featureLength A numeric vector with feature lengths that can be obtained using biomaRt.
• meanFragmentLength A numeric vector with mean fragment lengths, which can be calculate with
Picard using CollectInsertSizeMetrics.
0
Entering edit mode
4.5 years ago

you have many ways to solve this problem ,example :DESeq2 or other R packages, in addition,you can write a script to solve this problem,here,i give my script to convert reads count to FPKM

def cal_FPKM(stringtie_count_file,ref,outpath):#stringtie_count_file euqal to featureCounts result file
count_matrix=numpy.delete(count_matrix, 0, axis=0)

genes=count_matrix[:,0].tolist()
count_matrix=numpy.delete(count_matrix, 0, axis=1)

count_matrix=count_matrix.astype(int)

exon_dict=load_exon(ref)#load exon bed file ,you can prepare this file by R package:GenomicFeatures

fpkm_file=os.path.join(outpath,"FPKM.txt")
fpkm=open(fpkm_file,"w")

for x in xrange(len(genes)):
gene=genes[x]
exon_len=exon_dict[gene]

fpkm.write(gene+"\t"+"\t".join(map(str,temp_fpkm))+"\n")

fpkm.close()

0
Entering edit mode
4.5 years ago
h.mon 34k

Did you run featureCounts yourself, or did you download this data? The original featureCounts output include a column with gene lengths, with these gene lengths and the counts, you have all needed to calculate FPKM according to the formula you linked:

FPKM = [RMg * 109 ] / [RMt * L]

RMg: The number of reads mapped to the gene

RMt: The total number of read mapped to protein-coding sequences in the alignment

L: The length of the gene in base pairs

FPKM is a (bad) within sample normalization, so RMt for each sample is just the sum of counts of reads mapping to all genes from that sample.

0
Entering edit mode

Hi,

I actually run featureCounts myself. if you are aware could you provide a simple code to convert featurecounts to FPKM.

thanq

0
Entering edit mode

Try countToFPKM package.