Question: Coefficient of variation
0
gravatar for nicoles
18 months ago by
nicoles10
nicoles10 wrote:

I am a newb and I come from a background of we lab experience. Recently, we have started doing RNA-Seq. Originally, our bioinformatics core was going to handle analysis and then that person went on sabbatical. I started using Galaxy to analyze our data. My PI has set parameters (based off the literature) before proceeding with GO terms. One of the conditions is only including genes with a CV of less than or equal to 0.5. Can I do this in Galaxy? If not, could some please tell me how I could do so manually.

I went through Tophat, cufflinks, cuffcompare, cuffdiff based off a colleagues recommendation. I also have a separate workflow of htseq-count then DESeq2.

Any help will be greatly appreciated.

Thanks!

ADD COMMENTlink modified 18 months ago by Renesh1.6k • written 18 months ago by nicoles10
1

I would make the most of the opportunity to learn some bioinformatics analyses for yourself and your curriculum.

Firstly, the TopHat/Cufflinks pipeline is somewhat long-winded, but is still very powerful. If you are just looking for raw counts over your RNA-seq samples, then I would use Kallisto (http://www.nature.com/nbt/journal/v34/n5/full/nbt.3519.html?foxtrotcallback=true) followed by DESeq2.

For Kallisto, you'll need to download the program and also a reference transcriptome CDS in FASTA format. If you are interested in counts over all coding and non-coding transcripts, then use GENCODE's reference: https://www.gencodegenes.org/releases/current.html (see the sub-heading 'Fasta files').1 I also wrote a wrapper for running Kallisto for colleagues when I worked at Queen Mary University of London:

Issues like CoV (coefficient of variation) will be dealt with by DESeq2 when it normalizes the data. DESeq2 now has built-in functions for reading in Kallisto data. If you need assistance with that, just reply here and I will be happy to assist.

Kevin

ADD REPLYlink written 18 months ago by Kevin Blighe39k
1
gravatar for Renesh
18 months ago by
Renesh1.6k
United States
Renesh1.6k wrote:

The CV calculations are necessary if you want to select stable and consistently expressed genes from your RNA-seq datasets. The CV calculation is very straightforward and involves standard deviation and mean. CV = SD/Mean. The CV will give you the extent of variability in your gene expression dataset. Your PI is telling to include the genes which are stably expressed across replicates/experiments as the CV is low (0.5).

I am not sure Galaxy do basic statistical calculation with the table data. To calculate CV, you can use database like psql or Excel. You can use CV calculations on htseq-count raw data and then proceed to DESeq package. Most of the gene epression packages calculate the dispersion which accounts for CV.

ADD COMMENTlink written 18 months ago by Renesh1.6k

Thank you. I'll calculate with the htseq-count. Is it also acceptable to calculate the stdev and mean from the cufflinks FPKM? For my own understanding and further explanation to my PI

ADD REPLYlink written 18 months ago by nicoles10

Yes, you can also calculate CV from FPKM. FPKM is also a normalized count.

ADD REPLYlink written 18 months ago by Renesh1.6k
0
gravatar for nicoles
18 months ago by
nicoles10
nicoles10 wrote:

Thank you for replying Kevin. I am trying to learn bioinformatics for myself and our lab. It is definitely and essential skill to have. With obtaining the raw counts from my RNA-Seq samples from Kallisto, can I then determine differentially expressed genes with DESeq2? Could I use DESeq2 through Galaxy after I obtain the counts in Kallisto? Thanks!

ADD COMMENTlink written 18 months ago by nicoles10

I hope that a tool like Galaxy accepts Kallisto-derived counts, or at best a custom matrix of counts. However, if the HT-seq option is already built-into Galaxy, then you should stick to HT-seq. As far as I recall, you'll therefore have to align the reads to produce a BAM file, over which HT-seq counts transcript abundances (Kallisto and other modern tools don't require a BAM alignment).

There is a great thread here for RNA-seq and Galaxy, which you may have already seen: https://galaxyproject.org/tutorials/rb_rnaseq/

ADD REPLYlink modified 18 months ago • written 18 months ago by Kevin Blighe39k
1

Yes, I did need the BAM files for ht-seq count. As there will be more RNA-seq coming, I would like to know quicker methods of quantification. In the near future I'll find out if Galaxy accepts the Kallisto counts. The tutorial has greatly helped .

ADD REPLYlink written 18 months ago by nicoles10

I would suggest instead of relying on Galaxy, you should use HPC/workstation for quicker and customized analysis.

ADD REPLYlink modified 18 months ago • written 18 months ago by Renesh1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1036 users visited in the last hour