I have NGS raw data and would like to take that fastq file to VCF file by variant calling workflow. And in all of these steps I would like to use python. So which tools I can use to process my fastq file all the way to VCF and then annotate my variants. Thanks in advance. By the way I need to use python. That is my professors order :/
You wouldn't do everything in python, that'd be a waste of CPU cycles and your time programming. Rather, you'd use something like snakemake to build a convenient python-based pipeline. It's quite likely that this is what your professor meant.
Perhaps Platypus is a solution, a variant caller written (partially) in python: http://www.well.ox.ac.uk/platypus
I don't know what your position in this research is, but following your professor's orders is not scientifically correct, be critical and check alternatives. Most people use GATK AFAIK, so don't make it too hard on yourself by using something exotic.
I am totally in favor of what John is stating, if the requirement is to learn python and how to code in it , there is no point to re-invent it. You can make a processing script in python but then it comes with its own time frame and your professor should understand that. It will not be a new out of the box work , just a processing workflow but major part will be subprocesses calling BWA,GATK or other downstream variant annotation tools. Devon is correct about the wastage of CPU cycles as well. I would in that case look for a python framework processing script already built that employs my requirement and test it and show my result to the boss. That is how it will work, you have to deep learn what tools you need and what you are using at each and every step of variant calling and why you do use them. That is more important than any processing script emplying any scripting language unless you have a strict requirement of languages code of conduct at your work So take a look at the below link
Thanks for you all I know this professor situation is kind of weird. But as you suggest I could subprocess other tools and try an analyysis and show the results to my professor which might convince him. So could you suggest me tools which are written in python so that I can sttart with them?
My workflow will need first an aligner, like BWA, then a tool for manipulating bwa files to bam and bai format, like samtools, and bcftools to vcf format, lastly an annotator like SnpEff and annovar.
I hope ou can help me like previous answers of your. in the mean time I will try other links and ideas of yours, thanks again .
Maybe have a look at this (scroll down for a BWA-samtools workflow example)
From the intro: Snakemake offers a definition language that is an extension of Python with syntax to define rules and workflow specific properties