Convert HTSeq count table to RPKM value using GFF/GTF
2
1
Entering edit mode
7.1 years ago
Whoknows ▴ 880

Hi friends

I used HTSeq for my tophat2 output for creating count tables,

Now i would like to convert count values to rpkm  with GTF file? (is there any script for that or something?)

i should to say, i know about rpkm() function in edgeR but i want to do this separately from that package.

thanks ..,,

tophat HTSeq RPKM • 14k views
0
Entering edit mode
3
Entering edit mode
7.1 years ago

Here A: Normalization Of Rna Sequencing Counts (By Ercc / Gene Length) is a reply of mine from a few months ago that includes code to take a GTF file and output the GC content and (union gene model) length of each gene in it. You can just remove the GC content stuff. This will give you an easy to use file to trivially convert raw counts to rpkm (take the counts, divide them by 0.001*the length in the output file and then divide by 1 million). It's usually a good idea to use normalized counts if you want the RPKMs to be useful (just import the values into DESeq2 and the counts(something, normalized=T)).

0
Entering edit mode

Thank you very much Devon, i did  it , i actually used my own script,

i summed only exon length for each gene, is it right? i mean we must work only on the exon not cds.

0
Entering edit mode

Yup, exactly.

0
Entering edit mode

Would you be willing to share your script? It would be super helpful. Thanks!

0
Entering edit mode

0
Entering edit mode

Sorry for telling that but, unfortunately i cannot find it. I will post it here if I find it.

0
Entering edit mode

But people usually import raw count into DESeq2 right? Can RPKMs be imported to DESeq2?

0
Entering edit mode

I was talking about producing RPKMs with DESeq2, not putting them into DESeq2. You're correct that giving DESeq2 RPKMs would not work well.

0
Entering edit mode

Hi, I am new to RNA-seq analysis. Can single end sequenced RNA seq counts be TPM normalized?

2
Entering edit mode
5.1 years ago
Whoknows ▴ 880

My R code for creating rpkm from HTSeq and GTF file :

First, you should create a list of gene and their length from GTF file by subtracting (column 5) - (column 4) +1, output Tabdelimited will be like :

Gene1   440
Gene2   1200
Gene3   569

and another file is HTSeq-count output file which made from SAM/BAM and GTF input files

I have used this code:

colnames(file_len)<- c("GeneName","Len")
colnames(file_count)<- c("GeneName","Count")
file_count<-file_count[ !grepl("__", file_count$GeneName) ,] total_count<- sum(file_count$Count)
oneB<-10^9
finallist <- merge(file_len,file_count,by="GeneName")
finallist$RPKM<-0 finallist[,2:4] <- (sapply(finallist[,2:4], as.double)) finallist$RPKM<- (oneB*finallist$Count)/(total_count*finallist$Len)
#finallist<-finallist[finallist\$RPKM>1,]
write.table(finallist,file="rpkm.txt",sep="\t", col.names = T, row.names = F)