Question: Htseq Count To Get Rpkm Values
gravatar for Nebo
8.2 years ago by
Ubana - IL USA
Nebo80 wrote:

I want to use htseq-count to get gene counts (RPKM) to analyze DE genes in a RNAseq experiment. I'm NOT interested in alternative splicing, only in getting RPKM values for downstream analyses with DEGseq, an R package. HTseq-count requires me a GFF file, but I only have my reference.fasta. Is there any way I can use the fasta file or convert it to GFF?

htseq gene rpkm • 9.7k views
ADD COMMENTlink written 8.2 years ago by Nebo80

1) Is there any gene information the fasta headers (please post an example if so) 2) What genome are you working with?

ADD REPLYlink written 8.2 years ago by Ryan Dale4.8k

The only gene information is the gene ID..this is all I need, I'm not looking for other features such as exons... I'm working with sugarcane, so there is no reference genome, I use as reference SAS (sugarcane assembled sequences) or the sorghum genemodels

ADD REPLYlink written 8.2 years ago by Nebo80

see my recipe here. since it sounds like you are using a transcript fasta file, the concept is the same Deg Analysis On 2 Mirna Library

ADD REPLYlink modified 8 days ago by RamRS25k • written 8.2 years ago by Jeremy Leipzig18k
gravatar for Urchgene
8.1 years ago by
Urchgene30 wrote:

I do not think its wise to use RPKM values for DGE in edgeR or DEGSEQ R packages because they are not raw counts but have been normalized already.

If you want to do this.....get this script from and count raw reads that map to features in a SAM file.

Then choose edgeR over DEGSEQ because you can normalize these raw counts to account for library size and so on .....

good luck.

ADD COMMENTlink written 8.1 years ago by Urchgene30

I could not see the option to get the counts for each feature? can we use a gtf file to get the counts?

ADD REPLYlink written 7.9 years ago by Rm7.9k

You are supposed to also normalize the raw counts with DESeq, that is one of the steps they tell you to do in the tutorial...

ADD REPLYlink written 6.9 years ago by John St. John1.1k
gravatar for Istvan Albert
8.2 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Each file type was invented to represent certain type of information. The fasta file was meant to store sequences, a GFF file was meant to represents genomic features (intervals). In general there is no way to directly convert between the two.

As daler above points out, if your fasta file happens to store each gene separately and also lists extra information about the coordinates then we could give you a parser that generates a GFF from it (post the header). Another option if you knew the gene sequences you could align these to the genome and thus creating your own annotations.

ADD COMMENTlink written 8.2 years ago by Istvan Albert ♦♦ 81k

I do know the gene sequences and I've already aligned with novoalign, what I want to do now is to get the expression values in RPKM. I can use the uniquely mapped genes as input in DEGseq, but I'd rather use RPKM ... Should I use cufflinks instead of HTseq count to get the RPKM values, so there is no need for a GFF file?

ADD REPLYlink written 8.2 years ago by Nebo80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1806 users visited in the last hour