Question: Python, Biopython And The Analysis Of Next-Gen Sequencing
3
gravatar for BioICoder
6.3 years ago by
BioICoder40
BioICoder40 wrote:

Hello all, I have a question regarding how to begin working with Next-Gen Sequencing. I am conceptually familiar with the Next-Gen Sequencing and its practical importance nowadays. However, I've never being working with the data itself and that's basically because I don't know exactly what the optimal way to start off. If there is away you could kindly give some guidelines of how to work with the data from the raw data upto the analysis phase? In addition, I have been reading that the scripting language as in Python is very useful to the analysis and automation for the data of Next-gen, what part of Python should I be learning to ultimately use it for the data of such?

Any websites or books that maybe helpful will be good as well. Thanks a lot.

python next-gen • 10k views
ADD COMMENTlink modified 6.3 years ago by swbarnes26.5k • written 6.3 years ago by BioICoder40

i think this is a good answer and a start also how to learn about next generation gene sequencing

this book as a beginning to orient you Bioinformatics for High Throughput Sequencing

this is also a workshop Workshop on next generation sequencing data analysis

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Medhat8.4k
6
gravatar for Josh Herr
6.3 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

This is really an open ended question so it's difficult to figure out where to begin to answer your question. I appreciate your drive to learn Python for data analysis, but it sounds like you really need to address (or at least communicate to us) what your research questions are before you develop which tools to use to answer those questions. Most often the type of tool you use is dependent on the type of data you have and how you want to analyze it.

I suggest trying to devise a strategy to analyze your data with a detailed explanation of what you expect to test/find at each step of the way. Then you can start to figure out which tools to use to answer each question. There are a lot of bioinformatic tools (as you can see from looking through this forum) to choose from.

Next, I would suggest taking a basic bioinformatics course (at least to learn about all the tools and the types of questions you can answer using them) or even using a general textbook. If you're coming from a biology background and don't have much experience with computers I would suggest the book Practical Computing for Biologists (which focuses on Python).

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Josh Herr5.6k

Hi Josh, Thanks for your great reply to my question about Python and Next-Generation Sequencing. To narrow down my question, my research aiming to study a number of Autosomal Recessive Intellectual Disabilities in middle east to obtain a novel gene or genes. I finished having the raw data to be annotated, I was thinking of having these annotated data in a tabular format using MySQL and then using queries to locate information needed by me. However, I also wanted to develop a pipeline using Python for the analysis phase of the process? I am just wondering how would you learn to do these pipelines analysis? On the web, there're not that much to find of like some kind of tutorial and mostly because they are in-house pipeline. How can I create a basic pipeline via Python based on the research given above? Is there some sort of guideline?

Thanks a lot and I appreciate it.

ADD REPLYlink written 2.8 years ago by BioICoder40
2
gravatar for swbarnes2
6.3 years ago by
swbarnes26.5k
United States
swbarnes26.5k wrote:

In general, the steps are:

1) Align your fastq of reads to a reference genome 2) Determine the variants between your sample and the reference genome 3) determine which variants are the best candidates for contributing to your clinical phenotype

Step 1 is fairly straightforward; there are a few different softwares that do that; bwa and bowtie are the most commonly used. Step 2 is trickier, you can try using GATK, but samtools is simpler to understand, and step 3 is the hardest.

So do research on those three steps, and try some things.

ADD COMMENTlink written 6.3 years ago by swbarnes26.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 657 users visited in the last hour