Comparing 2 different sets of RNAseq data - correlation
4
0
Entering edit mode
6.1 years ago
Genosa ▴ 150

Hi, sorry if this is a really basic question but I'm new to R and bioinformatics. After spending A LOT of time generating my RNAseq data from tuxedo suite, I need to compare 2 datasets generated by different methods. I am not sure what is the best method, but In papers I often see the usage of correlation scatter plots. Are there software packages or methods that can do this? My computer is basic (a laptop) and can crash when opens up too many large files.

RNA-Seq • 7.7k views
2
Entering edit mode
6.1 years ago
sysbiocoder ▴ 180

Initially make a table of FPKM values There are several packages available in R. I personally use "corrplot" Go through the below link, it explains the basics for creating correlation plot http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software

0
Entering edit mode

Hi Sysbiocoder,

Thank you for the useful link. The metioned correlation methods (spearman vs. pearson), which would be the method of choice for comparing 2 database of different sequncing depths (i.e. I also have miseq vs. hiseq data from the same types of cells).

I have read online resources but generally says "there is no correct method" but which would make better sense for this sort of comparison?

Thanks

0
Entering edit mode

Spearman rank correlation is a non-parametric test for finding association two variables that are ordinal. Pearson r correlation for measuring the relationship between variables,In case of Pearson r correlation, both variables should be normally distributed. When sample size is large we assume the data is normally distributed based on central limit theorem. According to my knowledge you can use Pearson r correlation (Experts please correct me, if I am wrong)

But before going for correlation analysis, you have to normalize your data (Check Combat normalization method in R package) to remove bias due to sequence depth

0
Entering edit mode

Spearman rank correlation is a non-parametric test for finding association two variables that are ordinal. Pearson r correlation for measuring the relationship between variables,In case of Pearson r correlation, both variables should be normally distributed. When sample size is large we assume the data is normally distributed based on central limit theorem. According to my knowledge you can use Pearson r correlation (Experts please correct me, if I am wrong)

But before going for correlation analysis, you have to normalize your data (Check Combat normalization method in R package) to remove bias due to sequence depth

0
Entering edit mode
6.1 years ago
sysbiocoder ▴ 180

You can use R package for that

How is your RNA seq data? Is it Expression data?

0
Entering edit mode
6.1 years ago
Genosa ▴ 150

Hi Sysbiocoder,

Yes, my data is available in Log2 FC or FPKM. May I know which R package I can use? Sorry I am new to R so I'll need some guidance on where to start.

0
Entering edit mode
6.1 years ago
sysbiocoder ▴ 180

Check this package for normalization

http://bioconductor.org/packages/release/bioc/html/sva.html

For correlation analysis, there are many packages available, you can use corrplot https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html