Question: After run Cufflinks - cuffdiff there is no information about isoforms, TSS, CDS...
0
gravatar for Paul
5.0 years ago by
Paul1.3k
European Union
Paul1.3k wrote:

Dear all,

I am working on my RNA-seq data and when I following procotcol "Differential gene and transcription expression analysis of RNA-seq experiment with TopHat nad Cufflinks" I have problem after cuffdiff my samples:

When I wanna check information about cuffdiff in R - CummeRBund - there are zero values :

> cuff
CuffSet instance with:
     12 samples
     25834 genes
     0 isoforms
     0 TSS
     0 CDS
     0 promoters
     0 splicing
     0 relCDS

Why do I have zeros? I have trouble with plotting some graphs also. 

 

Here is my syntax:

./cuffdiff -o all_samples_diff_prochazka -L 159,160,165,185,207,35,65,75,91,93,155,156 -b /home/iab/bundle/2.8/hg19/ucsc.hg19.fasta -u all_samples_merged_prochazka/merged.gtf -p 8 --library-type fr-unstranded -c 5 /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/159_tpout_hg19/159_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/160_tpout_hg19/160_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/165_tpout_hg19/165_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/185_tpout_hg19/185_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/207_tpout_hg19/207_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/35_tpout_hg19/35_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/65_tpout_hg19/65_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/75_tpout_hg19/75_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/91_tpout_hg19/91_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/93_tpout_hg19/93_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/155_tpout_hg19/155_accepted_hits.bam /home/iab/rna_analysis/tophat-2.0.11.Linux_x86_64/156_S7_tpout/accepted_hits.156_S7.bam

Thank you for any correction or ideas.

Paul.

rna-seq cufflinks cummerbund • 4.4k views
ADD COMMENTlink modified 4.9 years ago • written 5.0 years ago by Paul1.3k
2
gravatar for Paul
4.9 years ago by
Paul1.3k
European Union
Paul1.3k wrote:

Hi all,

finally the solution was easy - thanks to Loyal A. Goff, who helped me with probably stupid issue.

The issue here is that your sample names are integers and R will not allow integers to be used as column names in a data.frame.  The easiest solution is to re-run your cuffdiff analysis using labels for your samples that do not start with a numeric value (0-9).


So whne you naming your samples - do not use just numbers - because some output from cummeRbund will not works!

 

Paul.

ADD COMMENTlink written 4.9 years ago by Paul1.3k
2
gravatar for andrew.j.skelton73
5.0 years ago by
London
andrew.j.skelton735.6k wrote:

It's all good, it's a tough learning curve! So first and foremost it sounds as if it's your annotation that's giving you the trouble, so I'd download the igenomes Ensembl archive (linked previously), and take out the fasta file for the human genome, the bowtie2 index and the gene track (GTF file). That's all you'll need for this. 

Make sure you've got the latest binaries installed for the tuxedo package (I don't think you need to run tophat2 anymore, just tophat).

Cufflinks - if you're looking for novel isoforms and transcripts then you need --GTF-guide on, -G just maps your transcripts to the reference. 

Cuffmerge - The issue is probably because you've put in assembles.txt twice. That text file is the last parameter of the file and the program may get confused by any parameters afterwards. So take out the first assemblies.txt and that should work. 

ADD COMMENTlink written 5.0 years ago by andrew.j.skelton735.6k

Thank you so much Andrew for your response and help!! :-) I am very appreciate it! So I will try it again with the iGenome sources!

Paul. 

ADD REPLYlink written 5.0 years ago by Paul1.3k
1
gravatar for Martombo
5.0 years ago by
Martombo2.4k
Seville, ES
Martombo2.4k wrote:

from cufflinks manual:

Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use. These attributes are: tss_id, p_id. 

Note: If an arbitrary GTF/GFF3 file is used as input (instead of the .combined.gtf file produced by Cuffcompare), these attributes will not be present, but Cuffcompare can still be used to obtain these attributes with a command like this: 

cuffcompare -s /path/to/genome_seqs.fa -CG -r annotation.gtf annotation.gtf

 

so make sure that your GTF file has got the attributes fields you need. if it doesn't, download the one suggested by andrew.j.skelton73 or produce your own.

ADD COMMENTlink written 5.0 years ago by Martombo2.4k
0
gravatar for andrew.j.skelton73
5.0 years ago by
London
andrew.j.skelton735.6k wrote:

Did you use the same reference genome throughout? Which one did you use? What species is this for?

ADD COMMENTlink written 5.0 years ago by andrew.j.skelton735.6k

Hi Andrew,

yeah I did use the same reference genome Hg19 from ucsc. I am also use merged all gtf reference files (you can see on my syntax above). I want compare expression level between two samples (one is healthy and one has cancer). Maybe found some novel isoforms and splicing and gene fusion. I am focusing on MLH1 gene (which is on 3 chromosome).

Paul. 

ADD REPLYlink written 5.0 years ago by Paul1.3k

Paul, If I were you, I'd run it against the Ensembl reference genome, especially for human, I've found that it's got much better annotation. You can also download the igenomes pack which contains everything you need for running the tuxedo pipeline. https://support.illumina.com/sequencing/sequencing_software/igenome.ilmn

Especially if you're looking for novel transcripts I'd use Ensembl. Feel free to put your tophat / cufflinks commands up for us to take a look at. How many replicates do you have?

 

ADD REPLYlink written 5.0 years ago by andrew.j.skelton735.6k

Thank you Andrew for sharing your experience. Actually I am pretty newbie in RNA-seq analysis. I was also try use reference GRCh37, because they have better naming of gene (ucsc refseq is just XLOC_number for gene naming and I am prefer gene name - MLH1). But after run TopHat and Cufflinks I had issue with cuffmerge - here is my error log:

[Mon Apr 28 13:55:49 2014] Beginning transcriptome assembly merge
-------------------------------------------

[Mon Apr 28 13:55:49 2014] Preparing output location merged_GRCh37_155_156/
[Mon Apr 28 13:56:04 2014] Converting GTF files to SAM
[13:56:04] Loading reference annotation.
Error: duplicate GFF ID 'ENST00000361547' encountered!
    [FAILED]
Error: could not execute gtf_to_sam

This I was not able to find solution for this bug :-( So I was continue with hg19. So probably I should download from iGenomes reference gene and gtf file and make all pipeline again.. What do you mean replicates? And where can i find it?

HERE IS MY PIPELINE FOR ucsc.hg19 reference genome and ucsc refgene gtf file:

./tophat2 -o tp_out --library-type fr-unstranded  -r 260 --fusion-search -G /home/refgene.gtf -p8 --b2-very-sensitive /home/ucsc.hg19.fasta  /home/R1_001.fastq /home/R2_001.fastq --rg-id Sample --rg-sample sample --rg-library rna-seq --rg-platform Illumina

./cufflinks -p 8 --library-type fr-unstranded -o cufflinks_out_gtf -G /home/refgene.gtf /home/tophat-/accepted_hits.bam

This I repeat for all samples.

./cuffmerge -o all_samples_merged -s /home/ucsc.hg19.fasta assemblies.txt -g /home/refgene.gtf -p 8 assemblies.txt

And finally cuffdiff.

./cuffdiff -o all_samples_diff -L 1,2,3,4,5,6,7,8,9,10 -b /home/ucsc.hg19.fasta -u all_samples_merged/merged.gtf -p 8 --library-type fr-unstranded -c 5 PATH/TO/ALL/MY/BAMS

And finally I run R - CummeRbund and have issue explained above . Sorry maybe for stupid questions.

Paul.

ADD REPLYlink written 5.0 years ago by Paul1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1381 users visited in the last hour