Question

How Complex It Is To Analyze Ngs Data ?

4

Entering edit mode

10.6 years ago

GouthamAtla 12k

Hi,

I would like to know how complex it is to analyze NGS Data. Is it possible to learn NGS data analysis from the online resources or should we learn under the guidance of an expert? How to get the core concepts of NGS data analysis? How to configure parameters while using open source tools? (Assembly, Alignment, statistics etc ). I have a masters degree in bioinformatics with unix, perl and basic core Java skills. Any advice is appreciated.

ngs • 6.0k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 10.6 years ago by GouthamAtla 12k

4

Entering edit mode

This is a complex question. It depends on many factors: technology, experimental design, computer resources, organism, ... In some cases is really straight forward, in others is a pain in the b**. You can learn in both ways, using online resources and with expert advice, nothing guarantees making you an expert ;)

ADD REPLY • link 10.6 years ago by JC 13k

3

Entering edit mode

Woah, Biostars' bot must have just put this to the top of the front page, and I was thinking "But Goutham can analyze bioinformatic data. Matter o' fact I thought he was pretty good at it. Why is he asking this?"

You've clearly learned a hell of a lot in the last few years man. Congratulations to you! :)

ADD REPLY • link 7.7 years ago by John 13k

3

Entering edit mode

Thanks for the appreciation. Its all about passion to learn something that really interests you. People on Biostars definitely helped a lot.

I asked this question when I was in dilemma to leave a good paying corporate job that I am not really interested Vs. to go to a research group that does a lot of genomics but with low pay. It was a risky decision for me. And I have moved to the research institute and now happy to be a Marie Curie fellow.

ADD REPLY • link 7.7 years ago by GouthamAtla 12k

1

Entering edit mode

Dude, maybe accept both answers so the bot stops bumping the post? Nostalgia is great, but I guess we need to give the bot a sense of closure.

ADD REPLY • link 7.7 years ago by Ram 43k

1

Entering edit mode

Exact same thought in my head. @Geek_y has grown A LOT! I'm so happy and proud!

ADD REPLY • link 7.7 years ago by Ram 43k

4

Entering edit mode

10.6 years ago

Alex Paciorkowski 3.5k

If you have a masters degree in bioinformatics with unix, perl and core Java skills, you can do this. How to get the core concepts? Like with anything else, read, go to talks, ask questions. There are many good sources of information here (search is your friend) and elsewhere online. I would recommend spending at least some time with someone who has worked with these data types, be it RNASeq or DNA, for real projects. There is still enough art and craft in this corner of science that learning some of the ropes from a mentor will save you down the road. Also, I can't emphasize enough working on projects with sound experimental design, and where NGS is applied appropriately. I see projects that never really go anywhere basically for these reasons, the experimental hypotheses were under-formulated or really a stretch, the experiment was underpowered, or the sequencing approach used was not going to give you an answer (single end reads, when paired end should have been done). Some of these things will be out of your control, some will be up to luck. But they can all cause problems for your analysis, and lead to the impression that the analysis of these types of data is "hard". On the other hand, there are times when the experimental design is sharp, the capture and sequencing go without a hitch, analysis hits no bumps in the road -- and as JC says above, it's as straight forward as it can get. Also, I think it's important to get hands-on experience working at every stage of the analysis pipeline, from initial qc, cleanup, trimming etc, all the way down to dealing with the called variants and annotation. Enjoy!

ADD COMMENT • link 10.6 years ago by Alex Paciorkowski 3.5k

score 5 · Accepted Answer · 2013-09-07

5

Entering edit mode

10.6 years ago

Pierre Lindenbaum 161k

download bwa, samtools and a reference genome.
generate a random set of reads using samtools/misc/wgsim
index the genome
align the reads and generate a sam output.
describe each column of the sam
generate a vcf from the sam using samtools/mpileup
describe each column of the vcf
use ensembl/vep to predict the consequences of the variations.