As mentioned before that R is a statistical analysis programming language. Since it is freely available and has a wide range of statistical tests and plotting option, it is widely used in the analysis of bioinformatics data.
R in bioinformatics
For example, there are many libraries that can remove contamination, perform quality checks on fastq files, analyze Next generation sequencing data, calculate the expression of genes, perform differential gene expression (DESEQ or EdgeR), and generate heatmaps, histograms, line plots, venndiagram and other relevant plots. Similarly, Microarray analyses can be done using R language which calculated fold change value after reducing the noise in data one such package is limma. Limma can analyze both microarrays as well as NGS data.
There are a lot of tools written in R which can read files that are generated from various instruments and can’t be read directly as text. Such as ab1 file or BAM files.
Many researchers use R language to calculate the difference in the sample and calculate p-values. A few of the most famous tests used in bioinformatics sample testing are T-test, Z-test, ANOVA, the test of normality and other parametric and non-parametric tests. Machine learning in R is also used as a way to classify and cluster biological data. There are a lot of papers that use R to create classifiers to classify biological data.
Many studies have used R to create mathematical models to predict the dependent and independent variable trends. Using R classification libraries researchers can do text mining saving a lot of time in manual curation. To found the relationships between various samples R is also widely used to calculate pairwise and multiple correlations.
R is also used to create plots that are used in publications. There is a separate package which uses R statistical programming language using which user can do a wide range of bioinformatics data analysis. Packages, which host a variety of tools, can help analyze bioinformatics data such as Microarray, differential gene expression, SNP, flow, PCR and other data handling. Using a package of R researches can perform the above-mentioned data analysis as well as much more. For example, the package of R can analyze end-to-end NGS data or microarray data without much manual intervention.
One of the NCBI resources, Gene Expression Omnibus (GEO), uses R to analyze microarray data available in the database online, which analyze the data and do mapping of probes to genes making it easier for the non-bioinformatics researcher to perform their own analysis.
There are many bioinformatics databases that used R for downloading and accessing the data these include Ensembl which uses biomaRt, TCGAbiolinks which use to access TCGA cancer data and many other webservers. Other than that R is also used to identify motifs in the sequences and can do mutation analysis. In mutation analysis allele-specific expression can be calculated in R. R language can be used to create HTML pages with inbuilt APIs which can link the database to the frontend with ease. This can help in setting up a bioinformatics webserver with minimal effort using Rstudio and RShiny. R is also being used to analyze data from flow cytometry, PCR and other low-throughput methods. Also, alignments can also be done using R language.
There is much more application of R in bioinformatics as almost all the data analysis in bioinformatics can be done using the R package.
Source of the Blog: https://en.novogene.com/resources/blog/hello-r-world-introduction-to-r/
- Roser, L. G., Agüero, F. & Sánchez, D. O. FastqCleaner: An interactive Bioconductor application for quality-control, filtering and trimming of FASTQ files. BMC Bioinformatics (2019) doi:10.1186/s12859-019-2961-8.
- iAnders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. (2010) doi:10.1186/gb-2010-11-10-r106.
- Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (2009) doi:10.1093/bioinformatics/btp616.
- Trakhtenberg, E. F. et al. Cell types differ in global coordination of splicing and proportion of highly expressed genes. Sci. Rep. (2016) doi:10.1038/srep32249.
- Jha, A., Mehra, M. & Shankar, R. The regulatory epicenter of miRNAs. J. Biosci. 36, 621–638 (2011).
- Jha, A., Panzade, G., Pandey, R. & Shankar, R. A legion of potential regulatory sRNAs exists beyond the typical microRNAs microcosm. Nucleic Acids Res. 43, 8713–24 (2015).
- Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. (2015) doi:10.1093/nar/gkv007.
- Hill, J. T. et al. Poly peak parser: Method and software for identification of unknown indels using sanger sequencing of polymerase chain reaction products. Dev. Dyn. (2014) doi:10.1002/dvdy.24183.
- Ru, Y. et al. The multiMiR R package and database: Integration of microRNA-target interactions along with their disease and drug associations. Nucleic Acids Res. (2014) doi:10.1093/nar/gku631.
- Zhang, J. et al. MiRspongeR: An R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules. BMC Bioinformatics (2019) doi:10.1186/s12859-019-2861-y.
- Edgar, R. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
- Yao, Z. et al. Discriminative motif analysis of high-throughput dataset. Bioinformatics (2014) doi:10.1093/bioinformatics/btt615.
- JKlinke, D. J. & Brundage, K. M. Scalable analysis of flow cytometry data using R/Bioconductor. Cytom. Part A (2009) doi:10.1002/cyto.a.20746.
- Ahmed, M. & Kim, D. R. pcr: An R package for quality assessment, analysis and testing of qPCR data. PeerJ (2018) doi:10.7717/peerj.4473.
- Bodenhofer, U., Bonatesta, E., Horejš-Kainrath, C. & Hochreiter, S. Msa: An R package for multiple sequence alignment. Bioinformatics (2015) doi:10.1093/bioinformatics/btv494.