Entering edit mode
6.8 years ago
wangdp123
▴
340
Hi there,
I am trying to use RNA-Seq data from a large number of samples to draw the phylogenetic tree based on FPKM values in order to show the relationship among all samples. (The samples include various species, tissues and treatments.)
Are there any elegant tools implemented for this purpose?
Many thanks,
Regards,
Tom
If you want to compare samples in context of RNAseq, have you thought about heatmaps or PCA? In the former case, you can also built a distance matrix-based phylogenetic tree. See here https://www.bioconductor.org/help/workflows/rnaseqGene/#exploratory-analysis-and-visualization. Pay attention on the fact, that here normalized units are not FPKMs, so you will need to normalize your raw counts as described in the tutorial.
Also keep in mind that its not the same kind of tree that you can make for example from sequences using ML or Max. parsimony methods :)
Hi, thanks for this. I have looked through the log2, rlog and VST normalization methods mentioned in this workflow. But my data are from different species, so in this case, I believe that some calculations should be made to normalize the gene lengths. Any thought about this?
I think you are right about the normalization by gene length. But I am still wondering what is your ultimate goal: if your goal is to obtain robust phylogeny, than is would be much better (and I'd say the only proper way) to reconstruct phylogeny based on assembled trancriptomes, i.e. based on sequence, like it is done for genomes, and not based on gene expression levels, because gene expression is a relative measure and depends on multiple biological and technical factors. But if you want just to do a basic quality control of your samples to see whether you see the expected grouping (like samples from same tissues and treatment cluster together), then simple PCA or heatmap (with simple Dist. matrix tree) should be enough.