Seeking a platform like R language for NGS data manipulation
Entering edit mode
6 months ago
field654 ▴ 30

Dear folks,

I wonder if there's any platform that allows me to manipulate NGS data by coding.

I used to apply NGS to verify some experiments like

Hybridoma IgG sequencing Quantify error-prone PCR error occurence

where I could simply feed a tiny portion of the data into R and manipulate by coding.

While it saved me much time from reading package manuals, the limitation is that R is usually single-threaded and weak in handling large data set.

I wonder if there's any R counterpart that's more suitable for NGS data analysis.

Thank you so much.


NGS R data analysis • 394 views
Entering edit mode

the limitation is that R is usually single-threaded and weak in handling large data set.

Not entirely true. I think the R interpreter itself is still single threaded, but there are plenty of packages that make use of multithreading and the multiple cores most modern machines have thanks to having backends written in C/C++ (e.g., data.table). See here and here for discussions. This page might also be helpful.

The data set size issues are more a memory issue rather than a R issue. This can be alleviated by chunking data, increasing available memory, or doing both. Alternatively you could use something like the bigmemory package or interface with a proper database like SQL.

If you're really insistent on using another language, I guess the only viable alternatives would be Julia or python (or perl if you are really old school), unless you want to attempt data manipulation in rust or C.

Entering edit mode

Dear Sir. Thank you very much for your advice. I've looked into multi-thread calculation. However, I soon ran into problem. Basically, I could create multiple clusters but the CPUs refuse to handle them in parallel. Rather, one got processed while others were waiting. A more detailed question was posted somewhere else, where I thought being more appropriate for computer questions. I seek your help to maybe share some advice. Many thanks. Field

Entering edit mode

Much of NGS data analysis is CLI driven and the frameworks are available in multiple languages (python, groovy etc). R is mostly used for statistics and graphing, which are in general end of the analysis steps. You can also buy commercial statistics software such S-plus, SPSS, SAS for better performance. R has a fork, MRAN which uses multiple cores.


Login before adding your answer.

Traffic: 1107 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6