Question

Python, Biopython And The Analysis Of Next-Gen Sequencing

3

Entering edit mode

10.9 years ago

BioICoder ▴ 40

Hello all, I have a question regarding how to begin working with Next-Gen Sequencing. I am conceptually familiar with the Next-Gen Sequencing and its practical importance nowadays. However, I've never being working with the data itself and that's basically because I don't know exactly what the optimal way to start off. If there is away you could kindly give some guidelines of how to work with the data from the raw data upto the analysis phase? In addition, I have been reading that the scripting language as in Python is very useful to the analysis and automation for the data of Next-gen, what part of Python should I be learning to ultimately use it for the data of such?

Any websites or books that maybe helpful will be good as well. Thanks a lot.

next-gen python • 16k views

ADD COMMENT • link updated 10.9 years ago by swbarnes2 14k • written 10.9 years ago by BioICoder ▴ 40

0

Entering edit mode

i think this is a good answer and a start also how to learn about next generation gene sequencing

this book as a beginning to orient you Bioinformatics for High Throughput Sequencing

this is also a workshop Workshop on next generation sequencing data analysis

ADD REPLY • link 10.9 years ago by Medhat 9.7k

score 6 · Answer 1 · 2013-05-25

This is really an open ended question so it's difficult to figure out where to begin to answer your question. I appreciate your drive to learn Python for data analysis, but it sounds like you really need to address (or at least communicate to us) what your research questions are before you develop which tools to use to answer those questions. Most often the type of tool you use is dependent on the type of data you have and how you want to analyze it.

I suggest trying to devise a strategy to analyze your data with a detailed explanation of what you expect to test/find at each step of the way. Then you can start to figure out which tools to use to answer each question. There are a lot of bioinformatic tools (as you can see from looking through this forum) to choose from.

Next, I would suggest taking a basic bioinformatics course (at least to learn about all the tools and the types of questions you can answer using them) or even using a general textbook. If you're coming from a biology background and don't have much experience with computers I would suggest the book Practical Computing for Biologists (which focuses on Python).

score 2 · Answer 2 · 2013-05-25

In general, the steps are:

1) Align your fastq of reads to a reference genome 2) Determine the variants between your sample and the reference genome 3) determine which variants are the best candidates for contributing to your clinical phenotype

Step 1 is fairly straightforward; there are a few different softwares that do that; bwa and bowtie are the most commonly used. Step 2 is trickier, you can try using GATK, but samtools is simpler to understand, and step 3 is the hardest.

So do research on those three steps, and try some things.