Question: How can merge RNA-Seq biological replicates ?
gravatar for hana
5.9 years ago by
hana180 wrote:


I have 6 RNA-Seq samples (biological replicates) . I would like to choose an FPKM cut-off to identify high and low express genes. I want to know how I can merge all biological replicates and then find the best FPKM cut-off threshold?

Can I use cuffmerge and then then cuffdiff ?

thank you 

rna-seq • 5.8k views
ADD COMMENTlink modified 5.9 years ago by mikhail.shugay3.4k • written 5.9 years ago by hana180

my samples are in the same condition

ADD REPLYlink written 5.9 years ago by hana180
gravatar for mikhail.shugay
5.9 years ago by
Czech Republic, Brno, CEITEC
mikhail.shugay3.4k wrote:

I would recommend using Cuffnorm to merge your samples (also runs on bam files, see You can then plot histograms of transcript detection (#samples where detected under threshold / 6) for a set of specified FPKM thresholds. After that you can re-formulate low- and high-expressed gene thresholds, for example as a threshold above (below) which 95% of genes are expressed in 5-6 (0-1) samples.

I don't think it is a good idea to merge all .bam files first and then perform an analysis, as you'll loose the info on expression variability coming from your biological samples. The rationale for using cuffnorm is that it will account for effects like different sequencing depth in samples, etc.

ADD COMMENTlink written 5.9 years ago by mikhail.shugay3.4k

I second the concern about merging biological replicates. The purpose of performing biological reps is to measure variability between your samples. You need to at least show there is reasonably low variability between the replicates before merging them. You could calculate Pearson or Spearman values between replicates as one way to accomplish this task. 

ADD REPLYlink written 5.9 years ago by Jason900

thank you for your comment. I have already run cuffnorm. Would you please tell me which file I have to use to make the graph , the genes.fpkm_table file or  genes.count_table file and how can I chose the FPKM thresholds based on this file?

I am very new in RNA-seq analysis and would be very appreciate if you could give me suggestion

thank you

ADD REPLYlink written 5.9 years ago by hana180
gravatar for Chirag Nepal
5.9 years ago by
Chirag Nepal2.2k
Chirag Nepal2.2k wrote:

To merge replicates sample, use samtools


Usage:   samtools merge [-nr] [-h inh.sam] <out.bam> <in1.bam> <in2.bam> [...]

Options: -n       sort by read names
         -r       attach RG tag (inferred from file names)
         -u       uncompressed BAM output
         -f       overwrite the output BAM if exist
         -1       compress level 1
         -R STR   merge file in the specified region STR [all]
         -h FILE  copy the header in FILE to <out.bam> [in1.bam]

Note: Samtools' merge does not reconstruct the @RG dictionary in the header. Users
      must provide the correct header with -h, or uses Picard which properly maintains
      the header dictionary in merging.




ADD COMMENTlink written 5.9 years ago by Chirag Nepal2.2k

I have run tophat and cufflink to each of my 6 samples. How to get an 'average' gene expression levels of this replicates? Can I run cuffmerge on assembly files (transcript.gtf) or  first I have to merge replicates all accepted_hits.bam files by samtools and then run cufflinks on the merged file?


ADD REPLYlink written 5.9 years ago by hana180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 530 users visited in the last hour