Convert HTSeq count table to RPKM value using GFF/GTF
2
1
Entering edit mode
9.6 years ago
Whoknows ▴ 960

Hi friends

I used HTSeq for my tophat2 output for creating count tables,

Now I would like to convert count values to rpkm with GTF file? (is there any script for that or something?)

I should say, I know about rpkm() function in edgeR but I want to do this separately from that package.

Thanks

tophat RPKM HTSeq • 16k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
3
Entering edit mode
9.6 years ago

Here A: Normalization Of Rna Sequencing Counts (By Ercc / Gene Length) is a reply of mine from a few months ago that includes code to take a GTF file and output the GC content and (union gene model) length of each gene in it. You can just remove the GC content stuff. This will give you an easy to use file to trivially convert raw counts to rpkm (take the counts, divide them by 0.001*the length in the output file and then divide by 1 million). It's usually a good idea to use normalized counts if you want the RPKMs to be useful (just import the values into DESeq2 and the counts(something, normalized=T)).

ADD COMMENT
0
Entering edit mode

Thank you very much Devon, I did it, I actually used my own script,

I summed only exon length for each gene, is it right? I mean we must work only on the exon not cds.

ADD REPLY
0
Entering edit mode

Yup, exactly.

ADD REPLY
0
Entering edit mode

Would you be willing to share your script? It would be super helpful. Thanks!

ADD REPLY
0
Entering edit mode

@pcsam.2008 Can you please share your script?

ADD REPLY
0
Entering edit mode

Sorry for telling that but, unfortunately i cannot find it. I will post it here if I find it.

ADD REPLY
0
Entering edit mode

But people usually import raw count into DESeq2 right? Can RPKMs be imported to DESeq2?

ADD REPLY
0
Entering edit mode

I was talking about producing RPKMs with DESeq2, not putting them into DESeq2. You're correct that giving DESeq2 RPKMs would not work well.

ADD REPLY
0
Entering edit mode

Hi, I am new to RNA-seq analysis. Can single end sequenced RNA seq counts be TPM normalized?

ADD REPLY
2
Entering edit mode
7.6 years ago
Whoknows ▴ 960

My R code for creating rpkm from HTSeq and GTF file :

First, you should create a list of gene and their length from GTF file by subtracting (column 5) - (column 4) +1, output Tabdelimited will be like :

Gene1   440
Gene2   1200
Gene3   569
  

and another file is HTSeq-count output file which made from SAM/BAM and GTF input files

I have used this code:

file_len<- read.delim("gene_length.txt",header=F,sep="\t")
file_count<- read.delim("sample1.counts",header=F,sep="\t")
colnames(file_len)<- c("GeneName","Len")
colnames(file_count)<- c("GeneName","Count")  
file_count<-file_count[ !grepl("__", file_count$GeneName) ,]
total_count<- sum(file_count$Count)
oneB<-10^9
finallist <- merge(file_len,file_count,by="GeneName")
finallist$RPKM<-0
finallist[,2:4] <- (sapply(finallist[,2:4], as.double))
finallist$RPKM<- (oneB*finallist$Count)/(total_count*finallist$Len)
#finallist<-finallist[finallist$RPKM>1,]
write.table(finallist,file="rpkm.txt",sep="\t", col.names = T, row.names = F)
ADD COMMENT

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6