Forum: NGS data analysis
gravatar for statfa
3.7 years ago by
statfa520 wrote:


My idea is to work on CODEX, " ", which is an R package to detect CNVs in WES data. As I know, this package can be downloaded directly under windows, mac, or linux. The input data for this package are: original mapped and sorted bam files (together with .bai files) and a single bed file (exonic target for all samples).

I have the .bam files, .bai files and the bed file. Now my question is do you think it is possible to carry out this work? I have no experience on working with Linux but I am a capable R user.

Thank you

forum analysis ngs biostatistics • 1.4k views
ADD COMMENTlink modified 3.2 years ago • written 3.7 years ago by statfa520

First, respect and good luck! Second, what you did not mention is how many time you have? do you know the aim of your thesis, or is it a vague task?

ADD REPLYlink written 3.7 years ago by H.Hasani810

Thank you. I have to finish it in 9 months.

ADD REPLYlink modified 3.2 years ago • written 3.7 years ago by statfa520

Please keep in mind, that I do not know you and whatever I'm going to say reflects my personal opinion, that worked for me but not necessarily it would work for you. Therefore, I'm writing it as a comment, not as an answer :)

Well, take it from someone who has been there, 9 months is pretty good time to learn these stuff, if you have a geneticist on your side, you are a half way already. If you are terrified and have great doubt, give yourself a task that you think you can accomplish in certain time, and test how actually you proceed. For example, installing the tools and get familiar with them. A good start would be to install Ubuntu/Linux, you know that you can install it on a USB stick and use it from there...break the ice with it and try to learn how it works, once you start "working" with Ubuntu start installing the tools one after another. Test yourself how fast you are, how comfortable you are, how confident you are becoming. If none of this is happening, check your heart how hard you want this thing!! I myself is a VERY stubborn ;) , so my advice would be of course to not let your fear win! Be realistic and have a backup plan before starting at the same time if it turns out not plausible to accomplish.

Good luck & respect again!

ADD REPLYlink written 3.7 years ago by H.Hasani810

You are right. I have tried to install samtools on cygwin by the help of the tutorials on the internet to run it on my windows but I couldn't. I will start working on Linux to see if I can install these tools there and work with them. I am very eager to follow this topic because not only does it seem so interesting to me but also I have spent enough time to learn the information about it. I hope I can go well with it. Thanks a lot for your assistance.

ADD REPLYlink modified 3.2 years ago • written 3.7 years ago by statfa520

If you can post this here, you can do NGS analysis for sure.

ADD REPLYlink written 3.7 years ago by Nari870

Oh, why do you say so? I hope so... let's see how it goes

ADD REPLYlink written 3.7 years ago by statfa520

I say so because, I started it the same way.

ADD REPLYlink written 3.7 years ago by Nari870

I am happy to hear it

ADD REPLYlink written 3.7 years ago by statfa520

Did you recently completely change the content of your first post? I don't see the connection between what is currently written and the reactions below.

ADD REPLYlink written 3.2 years ago by WouterDeCoster42k

@natalia was editing her past original posts and (unintentionally) bumping them up to main page. She knows about this biostars feature now.

ADD REPLYlink written 3.2 years ago by genomax75k
gravatar for Dr. Mabuse
3.7 years ago by
Dr. Mabuse47k
Bergen, Norway
Dr. Mabuse47k wrote:

In my opinion education in statistics or computation gives you a solid foundation for the analysis of ngs data. Msc might also be a good time point to choose this topic. You have the option to develop your own perspective on the topic and learn about the origin of the data on the way using analytical skills that align with your background in math and statistics. It will be helpful to know some biology relevant to the data type. Planning for little bit of extra time for learning biology definitely also helps. Knowledge in using linux is a plus to run some tools, but R is mostly platform independent.

You will find a lot of questions and answers already here or on seqanswers and elsewhere online, and I am sure everyone here will be willing to answer new questions. One of the most important points to consider is to define a good topic together with your supervisor before starting the thesis.

This brings me to the few risks of the endeavor, which come from the fact that there is, as you state, nobody else working on the same topic. So you would possibly not be able to discuss much in detail with other researchers in your group. Writing to strangers on the internet is not always an easy drop in for personal communication and good supervision (not taking the experience of your supervisor into doubt), but I think it is possible in principle, and feasible. This might not necessarily hold for a PhD thesis. On the other hand, your qualification will be in high demand after you have finished.

Trying to come up with the most important contributing factors for a successful thesis, I think I would rank them:

  • Motivation-interest
  • Supervision
  • Group - environment
  • Time
  • Knowledge-skills (means you know something about the topic already)

As I see it, if the topic is very interesting to you, then it will be a very likely path. There are possibly some steep learning curves involved, but imho, a computational education prepares you better for NGS bioinformatics than a biological education. I hope this helps, good luck.

Disclaimer: ????predict.future

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Dr. Mabuse47k

Thank you a lot for your advice. Yes i have been advised before to enrich my biology knowledge and I am currently doing this. I will start working with Linux and try to install samtools there. My efforts to install samtools on cygwin to run it on windows wasn't successful.

I will follow your advice. My motivation is high. I hope I can enrich other factors too. I will ask questions here and I hope others will help me.

Thank you so much for your support

ADD REPLYlink written 3.7 years ago by statfa520

Cygwin might become a possible burden, because it is the least common for testing and compilation in a 'unix' environment. BioLinux on the other hand might save you from the compilation hassle for many applications. Based on ubuntu 14.04 LTS it is also reasonably user friedly (if it works with your hardware), see

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Dr. Mabuse47k

Thank you very much... I will try it for sure and I hope it works...

ADD REPLYlink written 3.7 years ago by statfa520
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1073 users visited in the last hour