Question: what should be the expected result of cuffmerge?
0
gravatar for lin.pei26
3.5 years ago by
lin.pei2660
China
lin.pei2660 wrote:

Hi:

I am working on Drosophila RNA-seq data as the Nat Protocal paper[1]

after running the program Cufflinks, I got expression value (FPKM) which were associated with IDs like "CUFF.2"

then I run the program cuffmerge "to create a single merged transcriptome annotation"[1]

but what I got is a file "merged.gtf" with lines like

"2L    Cufflinks    exon    74903    75018    .    +    .    gene_id "XLOC_000003"; transcript_id "TCONS_00000004"; exon_number "3"; gene_name "galectin"; oId "CUFF.2.2"; nearest_ref "FBtr0078101"; class_code "j"; tss_id "TSS3";

this file "merged.gtf" did not provide FPKM values at all.

Have I got the correct/expected result ?

Should Cuffmerge returns expression values associated with each gene (such as a matrix with gene symbol as row names and FPKMs in each row)

 

Thanks in advance!

Best,

 

 

[1] Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. ; 7(3): 562–578. doi:10.1038/nprot.2012.016

cuffmerge • 3.8k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by lin.pei2660

Thank you Devon!

Could you please also tell me how to generate a matrix with rows for gene expression and cols for samples which using gene symbol (or other identifiers) as row names ?

That is, how to transform "CUFF.2.2" to a symbol making biological sense.

Should I write scripts for this purpose ?

 

Many Thanks!

ADD REPLYlink written 3.5 years ago by lin.pei2660

cuffmerge (and cufflinks, for that matter) can be given a GTF or GFF annotation file that will contain things like gene names. You can get that from where ever you downloaded the reference fasta file. Regarding a matrix of expression values, I don't use cufflinks/cuffdiff enough to know of a simple method to do that, though I presume that cuffdiff would output such a file.

ADD REPLYlink written 3.5 years ago by Devon Ryan82k

thank you Devon!

I have download all annotation file from http://ccb.jhu.edu/software/tophat/igenomes.shtml

and used these two file for Tophat:

1) Drosophila_melanogaster/Ensembl/BDGP5.25/Annotation/Archives/archive-2014-05-23-16-02-55/Genes/genes.gtf

2) Drosophila_melanogaster/Ensembl/BDGP5.25/Sequence/BowtieIndex/genome

the cuffmerge output file merged_asm/merged.gtf contains 3900 CUFF IDs while the cufflinks output transcripts.gtf contains > 10000 CUFF IDs.

Do you think there is something inconsistent, given that many cufflinks identified CUFFs were missed by cuffmerge?

 

 

ADD REPLYlink written 3.5 years ago by lin.pei2660

Usually the files from igenomes work well together. Perhaps there's a problem in this case. Try to run cuffmerge again, ensuring that you specify the genes.gtf file. You might also look into the genes.gtf file to ensure that it contains gene names and IDs.

ADD REPLYlink written 3.5 years ago by Devon Ryan82k

Thank you!

below are the first two lines of genes.gtf file:

2L    protein_coding    exon    7529    8116    .    +    .    exon_number "1"; gene_id "FBgn0031208"; gene_name "CG11023"; p_id "P9062"; transcript_id "FBtr0300689"; transcript_name "CG11023-RB"; tss_id "TSS8382";

2L    protein_coding    exon    7529    8116    .    +    .    exon_number "1"; gene_id "FBgn0031208"; gene_name "CG11023"; p_id "P8862"; transcript_id "FBtr0300690"; transcript_name "CG11023-RC"; tss_id "TSS8382";

It seemed OK. But I found that under the dir BowtieIndex, files actually do not provide the gene symbol and transcripts ID. Could this be the reason for showing CUFF.2.2 in transcripts.gtf files ?

 

 

ADD REPLYlink written 3.5 years ago by lin.pei2660

I assume that the files in BowtieIndex are just fasta files, so they wouldn't contain those names. That's expected. Try rerunning cuffmerge with the gtf file and see if that fixes things. If not then I don't know what the problem is.

ADD REPLYlink written 3.5 years ago by Devon Ryan82k

Thanks a lot!

when running cuffmerge I got such warnings:

Warning: couldn't find fasta record for '2LHet'!
Warning: couldn't find fasta record for '2RHet'!
Warning: couldn't find fasta record for '3LHet'!
Warning: couldn't find fasta record for '3RHet'!
Warning: couldn't find fasta record for 'U'!
Warning: couldn't find fasta record for 'XHet'!
Warning: couldn't find fasta record for 'YHet'!
Warning: couldn't find fasta record for 'dmel_mitochondrion_genome'!

 

Did you see thing like these ?

Thanks!

 

 

 

ADD REPLYlink written 3.5 years ago by lin.pei2660

That suggests a mismatch between the files. I don't use drosophila for anything so I'm never used the files you're using.

ADD REPLYlink written 3.5 years ago by Devon Ryan82k
2
gravatar for Devon Ryan
3.5 years ago by
Devon Ryan82k
Freiburg, Germany
Devon Ryan82k wrote:

The purpose of cuffmerge isn't to produce FPKMs. What cufflinks does is (1) look for annotated and unannotated transcripts in a sample and (2) assign an FPKM value to them. The issue is, that different samples will typically result in somewhat different annotations, so the FPKM values aren't comparable (well, they wouldn't be anyway, but that's another issue). Consequently, what cuffmerge does is merge the annotations from all of the samples into a single master annotation that can then be used to estimate FPKM values for each of the samples.

I should also note that FPKM values can't reliably be estimated one sample at a time. The reason is that with a single sample, you have no way to control for library differences other than simply dividing by the library size, which is known to be quite unreliable. Consequently, cuffdiff (or cuffnorm in the updated pipeline, I think) has to re-estimate everything with the merged annotation and then perform normalization of the results before it can produce FPKM values.

BTW, unless you actually need to look for novel isoforms/genes, you're best off just not using cufflinks/cuffdiff. Those programs are very slow and not particularly flexible.

ADD COMMENTlink written 3.5 years ago by Devon Ryan82k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 697 users visited in the last hour