Question: RNA Seq analysis of rice samples
gravatar for alim.hcu
11 months ago by
alim.hcu10 wrote:

Dear I am analysing a RNA sequencing data of rice samples. Sequenced were mapped using tophat and most of sample had more than 90% mapping rate (although there were 50% multimapped reads). When I used cufflink for quantification of genes, i found more than 90% genes showed fpkm value less than zero. So i am getting very less number of differentially expressed genes. What may be the reason behind this?

sequencing rna-seq next-gen • 392 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by alim.hcu10

Are you sure its fpkm < 0? Could you put a screenshot of the cufflinks file?

ADD REPLYlink written 11 months ago by c.chakraborty160 was typo error. .less than one..

ADD REPLYlink written 11 months ago by alim.hcu10

What are you samples specifically? If you're looking at a single tissue type, or a single cell type then you'll have a majority of genes turned off. Genes necessary in flowering won't be expressed if this is a leaf study, and most photosynthetic genes should be turned off in the roots. Intuitively I wouldn't be surprised if 50-70% of genes were turned off depending on the study design.

I'd also wonder about more specifics of your reads/libraries. What is your read depth? Can you generate a histogram of FPKM values? Better yet, if you generate a density plot does the distribution look bimodal? You can graph this in R with: plot(density(log10(fpkmValues))) Getting a better feel for the data will help you understand if this result is an error or if it actually makes biological sense.

ADD REPLYlink written 11 months ago by shawn.w.foley1.2k

RNA was isolated from 10 day old seedling plant. We have average 50 million reads per sample.

ADD REPLYlink written 11 months ago by alim.hcu10

Please show all commands that you have used. As a community full of volunteers, we do not have much time to be constantly asking new questions to extract information from you. You need to provide all information in your opening question.

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe56k

Have you run a PCA analysis with your data from the samples? It can tell you if the control and the treated samples are distinct enough for further differential gene expression analysis. Alternatively, you can try running STAR or hisat2 alignment followed by DESEq2 analysis. Cufflinks normalizes the reads coming from the transcripts wrt the length, therefore long transcripts with not so many reads are often represented with lower fpkm, and therefore you might lose the information about how much of the actual number of reads are coming from the transcripts. DESeq2 works on actual number of reads without the normalization to length, hence you lose less information.

ADD REPLYlink written 11 months ago by c.chakraborty160

Thanks for your kind reply....

As you suggested that i should proceed with HIsat2. Here i got following output from hisat2. I think the problem is with multimapped reads. Due to 49% multimapped reads, only 35% reads aligned uniquely, so less number of reads for quantification (as htseq-count quantify only uniquely mapped reads). Hence genes showing less fpkm value.

hisat2 -q -p 10 -x Oryza_sativa -1 /home/jkt/Software/TrimGalore-0.5.0/S1_JD_R1_val_1.fq.gz -2 /home/jkt/Software/TrimGalore-0.5.0/S1_JD_R2_val_2.fq.gz -S JD1_Rice.sam

36772695 reads; of these: 36772695 (100.00%) were paired; of these: 5696023 (15.49%) aligned concordantly 0 times 12980109 (35.30%) aligned concordantly exactly 1 time 18096563 (49.21%) aligned concordantly >1 times

5696023 pairs aligned concordantly 0 times; of these:
  1400885 (24.59%) aligned discordantly 1 time
4295138 pairs aligned 0 times concordantly or discordantly; of these:

  8590276 mates make up the pairs; of these:
    3388810 (39.45%) aligned 0 times
    1381239 (16.08%) aligned exactly 1 time
    3820227 (44.47%) aligned >1 times

95.39% overall alignment rate

ADD REPLYlink written 11 months ago by alim.hcu10

Cool - that seems like the problem (multi-mapped reads), as you imply.

ADD REPLYlink written 11 months ago by Kevin Blighe56k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2062 users visited in the last hour