Question

Comparing 2 different sets of RNAseq data - correlation

0

Entering edit mode

7.3 years ago

Genosa ▴ 160

Hi, sorry if this is a really basic question but I'm new to R and bioinformatics. After spending A LOT of time generating my RNAseq data from tuxedo suite, I need to compare 2 datasets generated by different methods. I am not sure what is the best method, but In papers I often see the usage of correlation scatter plots. Are there software packages or methods that can do this? My computer is basic (a laptop) and can crash when opens up too many large files.

Thank you for your advice and help

RNA-Seq • 8.7k views

ADD COMMENT • link updated 7.3 years ago by sysbiocoder ▴ 180 • written 7.3 years ago by Genosa ▴ 160

0

Entering edit mode

7.3 years ago

sysbiocoder ▴ 180

You can use R package for that

How is your RNA seq data? Is it Expression data?

ADD COMMENT • link 7.3 years ago by sysbiocoder ▴ 180

0

Entering edit mode

7.3 years ago

Genosa ▴ 160

Hi Sysbiocoder,

Yes, my data is available in Log2 FC or FPKM. May I know which R package I can use? Sorry I am new to R so I'll need some guidance on where to start.

ADD COMMENT • link 7.3 years ago by Genosa ▴ 160

0

Entering edit mode

7.3 years ago

sysbiocoder ▴ 180

Check this package for normalization

http://bioconductor.org/packages/release/bioc/html/sva.html

For correlation analysis, there are many packages available, you can use corrplot https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html

ADD COMMENT • link 7.3 years ago by sysbiocoder ▴ 180

score 2 · Accepted Answer · 2016-12-30

2

Entering edit mode

7.3 years ago

sysbiocoder ▴ 180

Initially make a table of FPKM values There are several packages available in R. I personally use "corrplot" Go through the below link, it explains the basics for creating correlation plot http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software

ADD COMMENT • link 7.3 years ago by sysbiocoder ▴ 180

0

Entering edit mode

Hi Sysbiocoder,

Thank you for the useful link. The metioned correlation methods (spearman vs. pearson), which would be the method of choice for comparing 2 database of different sequncing depths (i.e. I also have miseq vs. hiseq data from the same types of cells).

I have read online resources but generally says "there is no correct method" but which would make better sense for this sort of comparison?

Thanks

ADD REPLY • link 7.3 years ago by Genosa ▴ 160

0

Entering edit mode

Spearman rank correlation is a non-parametric test for finding association two variables that are ordinal. Pearson r correlation for measuring the relationship between variables,In case of Pearson r correlation, both variables should be normally distributed. When sample size is large we assume the data is normally distributed based on central limit theorem. According to my knowledge you can use Pearson r correlation (Experts please correct me, if I am wrong)

But before going for correlation analysis, you have to normalize your data (Check Combat normalization method in R package) to remove bias due to sequence depth

ADD REPLY • link 7.3 years ago by sysbiocoder ▴ 180

0

Entering edit mode

Spearman rank correlation is a non-parametric test for finding association two variables that are ordinal. Pearson r correlation for measuring the relationship between variables,In case of Pearson r correlation, both variables should be normally distributed. When sample size is large we assume the data is normally distributed based on central limit theorem. According to my knowledge you can use Pearson r correlation (Experts please correct me, if I am wrong)

But before going for correlation analysis, you have to normalize your data (Check Combat normalization method in R package) to remove bias due to sequence depth

ADD REPLY • link 7.3 years ago by sysbiocoder ▴ 180