Python, Biopython And The Analysis Of Next-Gen Sequencing
3
3
Entering edit mode
11.6 years ago
BioICoder ▴ 40

Hello all, I have a question regarding how to begin working with Next-Gen Sequencing. I am conceptually familiar with the Next-Gen Sequencing and its practical importance nowadays. However, I've never being working with the data itself and that's basically because I don't know exactly what the optimal way to start off. If there is away you could kindly give some guidelines of how to work with the data from the raw data upto the analysis phase? In addition, I have been reading that the scripting language as in Python is very useful to the analysis and automation for the data of Next-gen, what part of Python should I be learning to ultimately use it for the data of such?

Any websites or books that maybe helpful will be good as well. Thanks a lot.

next-gen python • 16k views
ADD COMMENT
0
Entering edit mode

i think this is a good answer and a start also how to learn about next generation gene sequencing

this book as a beginning to orient you Bioinformatics for High Throughput Sequencing

this is also a workshop Workshop on next generation sequencing data analysis

ADD REPLY
6
Entering edit mode
11.6 years ago
Josh Herr 5.8k

This is really an open ended question so it's difficult to figure out where to begin to answer your question. I appreciate your drive to learn Python for data analysis, but it sounds like you really need to address (or at least communicate to us) what your research questions are before you develop which tools to use to answer those questions. Most often the type of tool you use is dependent on the type of data you have and how you want to analyze it.

I suggest trying to devise a strategy to analyze your data with a detailed explanation of what you expect to test/find at each step of the way. Then you can start to figure out which tools to use to answer each question. There are a lot of bioinformatic tools (as you can see from looking through this forum) to choose from.

Next, I would suggest taking a basic bioinformatics course (at least to learn about all the tools and the types of questions you can answer using them) or even using a general textbook. If you're coming from a biology background and don't have much experience with computers I would suggest the book Practical Computing for Biologists (which focuses on Python).

ADD COMMENT
0
Entering edit mode

Hi Josh, Thanks for your great reply to my question about Python and Next-Generation Sequencing. To narrow down my question, my research aiming to study a number of Autosomal Recessive Intellectual Disabilities in middle east to obtain a novel gene or genes. I finished having the raw data to be annotated, I was thinking of having these annotated data in a tabular format using MySQL and then using queries to locate information needed by me. However, I also wanted to develop a pipeline using Python for the analysis phase of the process? I am just wondering how would you learn to do these pipelines analysis? On the web, there're not that much to find of like some kind of tutorial and mostly because they are in-house pipeline. How can I create a basic pipeline via Python based on the research given above? Is there some sort of guideline?

Thanks a lot and I appreciate it.

ADD REPLY
2
Entering edit mode
11.6 years ago

In general, the steps are:

1) Align your fastq of reads to a reference genome 2) Determine the variants between your sample and the reference genome 3) determine which variants are the best candidates for contributing to your clinical phenotype

Step 1 is fairly straightforward; there are a few different softwares that do that; bwa and bowtie are the most commonly used. Step 2 is trickier, you can try using GATK, but samtools is simpler to understand, and step 3 is the hardest.

So do research on those three steps, and try some things.

ADD COMMENT

Login before adding your answer.

Traffic: 1430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6