Question: Length of transcripts of mm9
0
gravatar for qwzhang0601
3.4 years ago by
qwzhang060160
United States
qwzhang060160 wrote:

I have got raw count of mouse RNA reads. In order to calculate the FPKM, I obtained the length of transcripts (CDS + UTR) of the genes through BioMart. And then for each gene I selected the longest one. However, later I found the alignment was mapped to an old version of genome (mm9). But the information I got from BioMart is on a new version of genome (i.e., mm10). And I found from the the web site of Biomart, I can only achieve CDS length (but not transcript length) for a old ensemble corresponding to mm9. So I want to know (1) is there a relatively easy way to get the length of transcripts for mm9, given the gene ensemble ID?
(2) If there is no easy ways, is it good enough to use the transcript length from mm10 to calculated the FPKM? I think the gene length from different version of genome should be the same or quite similar, right?

Thanks

rna-seq fpkm biomart • 2.5k views
ADD COMMENTlink modified 3.4 years ago by EagleEye6.5k • written 3.4 years ago by qwzhang060160

You can find mm9 genome in Ensembl release 67 here. BioMart from there should get you the information you need.

Was your data mapped to mm9 genome or transcriptome. Either would have different lengths and can't be used interchangeably.

ADD REPLYlink written 3.4 years ago by genomax74k

Thank you. Yes, I can find the Ensemble release 67, but for this version of genome I can only get the Length of CDS not the transcripts.

ADD REPLYlink written 3.4 years ago by qwzhang060160

Are you sure about that? I am seeing the usual options under "Attributes"/"Sequence".

ADD REPLYlink written 3.4 years ago by genomax74k

Do you mean "Attributes"->"sequence"-> "cDNA sequences". And then calculate the length of the transcripts? I think it works. In the latest version of genome, I can directly get such information by choosing "Attributes"-> "Features"->"Transcript length (including UTRs and CDS)". But Thanks for your suggestions.

ADD REPLYlink written 3.4 years ago by qwzhang060160

the raw count of reads were obtained by HTSeq. I think it must use the transcript annotation for the alignment. Thanks

ADD REPLYlink written 3.4 years ago by qwzhang060160

Not many use FPKM any more. Are you looking to do DE analysis? If so you can use the raw counts you have with a package like DESeq2.

ADD REPLYlink written 3.4 years ago by genomax74k

Yes. I will use DESeq2 to do find DE genes. However, I want to generate a heat map showing expression levels of genes under different conditions. I think the FPKM show relative expression level of genes. Besides, before I do the DE analysis I want to filter out unexpressed genes (i.e., FPKM<1) under both conditions. Thanks for your help.

ADD REPLYlink written 3.4 years ago by qwzhang060160

Please do not send identical questions to BioStars and Ensembl helpdesk (twice).

ADD REPLYlink written 3.4 years ago by Emily_Ensembl19k

I see. Sorry for that!

ADD REPLYlink written 3.4 years ago by qwzhang060160
0
gravatar for Ar
3.4 years ago by
Ar910
United States
Ar910 wrote:

You may use this code; however you need GTF mm10 as an input file.

## Transcript Length
library(GenomicRanges)
library(rtracklayer)

## reading your mm10 gtf file and read only exons
GTFfile <- "mm10,gtf" ## your mm10 gtf file
GTF <- import.gff(GTFfile, format="gtf", genome="mm10", asRangedData=F, feature.type="exon") 
grl <- reduce(split(GTF, elementMetadata(GTF)$gene_id))
reducedGTF <- unlist(grl, use.names=T)
elementMetadata(reducedGTF)$gene_id <- rep(names(grl), elementLengths(grl))
elementMetadata(reducedGTF)$widths <- width(reducedGTF)

## calculation of length
calc_length <- function(x) {
  sum(elementMetadata(x)$widths)
}
output <- t(sapply(split(reducedGTF, elementMetadata(reducedGTF)$gene_id), calc_length))
output <- data.frame(t(output))
output$id = rownames(output)
output$id = gsub(pattern = "(.*)[.](.*)", replacement = "\\1", x = output$id)
output = output[,c(2,1)]
colnames(output)[2] = "Transcript.Length"
head(output)
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Ar910

Thank you. But would you please briefly explain the code? Does this code return length of each transcript of a gene? Or it returns a length for each gene (the longest or the union of transcripts)?

Thanks

ADD REPLYlink written 3.4 years ago by qwzhang060160

This will give you the length of the transcript using GTF/GFF file. Here is what the code does:

  1. import.gff reads the file using the exon regions and creates IRange object
  2. reduce is used for combining all the exons based on a particular gene id
  3. calculate the length of the entire exon region using the width that is obtained using reduce function

Try running the code and check the results and you will see what I am doing.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Ar910

I downloaded the mmg.gtf from "ftp://ftp.ensembl.org/pub/release-67/gtf/mus_musculus/". But when I run the following code, I got an error. And there are warnings? Do you have any suggestions? Thanks.

GTFfile <- "mm9.gtf" 

GTF <- import.gff(GTFfile, format="gtf", genome="mm9", asRangedData=F, feature.type="exon")

Error in validObject(.Object) : 
  invalid class “GRanges” object: 'seqnames' contains missing values
In addition: There were 16 warnings (use warnings() to see them)
> warnings() 
Warning messages:
1: 'BiocGenerics:::updateS4' is deprecated.
Use 'replaceSlots' instead.
See help("Deprecated")
2: 'BiocGenerics:::updateS4' is deprecated.
3: ......
ADD REPLYlink modified 3.4 years ago by genomax74k • written 3.4 years ago by qwzhang060160
0
gravatar for EagleEye
3.4 years ago by
EagleEye6.5k
Sweden
EagleEye6.5k wrote:

This might be helpful

A: Converting gtf format to bed format (gene & transcript level)

A: How to normalise read count per gene

https://github.com/santhilalsubhash/rpkm_rnaseq_count

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by EagleEye6.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1236 users visited in the last hour