Question: No differential gene expression after tuxedo protocol
0
gravatar for vinayjrao
20 months ago by
vinayjrao110
JNCASR, India
vinayjrao110 wrote:

Hi, I'm analyzing some RNA-Seq data using the old tuxedo protocol (tophat, cufflinks, cuffmerge and cuffdiff). I checked the cufflinks output (transcripts.gtf) and found that there is an expression value (fpkm > 0) for all genes, although after cuffdiff, this is not the case. A lot of the genes have an fpkm of zero, because of which I get no differential expression. I have 4 different samples, so I expect to see at least some differential expression.

Thanks

rna-seq cuffdiff • 951 views
ADD COMMENTlink modified 20 months ago by Buffo1.5k • written 20 months ago by vinayjrao110

Do you really want to use the tuxedo pipeline? It hasn't been anywhere near best practice for a number of years. What species is this?

ADD REPLYlink written 20 months ago by Devon Ryan89k

I agree with @Devon. However, your samples seem to suffer from another issue here. can you check following

  1. Total number of reads. For an organism of the size of human, you will need at least 15-20 million reads per replicate.
  2. You have at least 3 biological replicates of each sample if you are using cuffdiff with default setting.
  3. You are aligning the reads to the correct organism.

Best,

ADD REPLYlink written 20 months ago by Satyajeet Khare1.3k

I'm currently trying hisat2 protocol, but haven't finished it yet, so I couldn't use it. And regarding the questions you asked, the number of reads are > 20M in each case; I have 2 biological replicates; the reference genome is hg38, downloaded from iGenomes

ADD REPLYlink written 20 months ago by vinayjrao110

Okay. Hows alignment percentage? Did you try uploading bam files onto the browser? How do the reads look?

Its not unusual that FPKM values in cufflinks output file are different from cuffdiff output.

ADD REPLYlink written 20 months ago by Satyajeet Khare1.3k

The alignment percentage is >90% in each case, I however haven't tried loading the bam files onto a browser. I will try that now with igv.

Thanks.

ADD REPLYlink written 20 months ago by vinayjrao110

More up to date differential expression pipeline exist (DESeq2, limma-voom, kalisto+sleut, etc).

When you say you have 4 different samples do you mean 4 different conditions you want to compare with one sample in each or 4 samples per conditions?

ADD REPLYlink written 20 months ago by VHahaut1.1k

Dear Radek, I have 2 cell line and 2 animal models, both with a cancer and a normal data set. I want the results between cancer and normal of cell line and animal separately. I will not be considering the other comparisons.

I'm not using DESeq2 protocol because for some reason bedtools coverage has constantly given me 0 reads mapping. I got a lot of suggestions to correct it, but unfortunately nothing worked. It would be very helpful if you could share a pipeline with me with all the scripts.

Thanks.

ADD REPLYlink written 20 months ago by vinayjrao110
1

If you see the mapping of the reads (with IGV for example) and bedtools is not working you could use featureCounts.

If I had to use DESeq2 from scratch I would start there Bioconductor: Differential expression. It also includes a guide to create your matrix of count.

ADD REPLYlink written 20 months ago by VHahaut1.1k

May be you want to read this: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown, actually tophat is obsolete software and kallisto I think is useful just for very curated genomes (and annotations).

ADD REPLYlink modified 20 months ago • written 20 months ago by Buffo1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 895 users visited in the last hour