Hi, I just finished calling snps and indels by GATK Mutect2 and got some VCF files. I want to annotate them with Oncotator or annovar. But I do not know how to further process them after annotation, like to draw figures and visualization and statistics. Is there some packages or softwares that can create publication-quality infographics and illustrations? Any one have some experience?
I have 230 paired WES from tumor and adjacent tissues and all covariates with them.
Thanks
Shirley
Is there some packages that you would like to suggest? I'm really a biginner here. I have used edgeR or DESeq which are only suitable for RNA-seq data.
If you want to visualize the VCFs you created, try IGV (http://www.broadinstitute.org/igv/).
You've basically hit the end of the automatic pipeline where you can run one tool after another and have something meaningful without designing a bioinformatics experiment. Mutect2 did the equivalent of edgeR (in a way) to use your analogy and so now you're stuck with analyzing the data (usually through statistical tests that depend completely on your hypothesis). Its true that edgeR generates images whereas I don't think Mutect2 does, but that's because it is a type of data that lends itself to some simple and informative visualizations where SNV data doesn't necessarily. The good news is, this is where the blind running of a pipeline ends, and the scientific thought begins!
If your experiment was just to generate SNVs for tumor/normal pairs, congratulations, you've completed that. This is the same type of information provided by TCGA for a number of different cancers. If you search the literature you'll see researchers analyzing this data in lots of different ways. You could see if the questions they are asking match what you are interested in and try to use their analysis methodology assuming your data meets whatever assumptions it has.
If you absolutely need some further direction, you can see what resesarchers did in this paper http://www.cell.com/cell/abstract/S0092-8674(12)01022-7?_returnURL=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867412010227%3Fshowall%3Dtrue (this is smokers/non-smokers but it is the same principle as tumor/normal). You could also look into a gene ontology type analysis looking at the type of genes mutated (which you should have in your annotations) and just creating a table and a pie chart of what percentage in each ontology group. You might already have the ontologies in your annotations but you'll need to extract this information (or map your coordinates to a database from here: http://geneontology.org/). I don't know of any tool that just automates this for you (there are some technical reason why this would be hard in VCF files in particular).
Good luck!
Many thanks, Jack!!!
No problem, I'm happy to help if you have any problems with the next steps.