Hello everyone! I am using a pipeline which performs tumor-normal matched analysis. As output, it gives the VCF and Excel files only for somatic variants, excluding germline.
As I need to have those germline variants in the file for further annotation, I have performed a script to add those absent germline variants in the excel file.
What I would need is to create a new vcf file from the data on that excel file. Is that possible?
I have been trying VariantAnnotator from Bioconductor and pysam and pyVCF in python. I couldn't find how to create one from scratch, but it reads already created vcf's. Any help, ideas, tips is more than welcome!
If you are starting with VCF and would like to end with a VCF, why are you creating the intermediate Excel files?
Hi Igor! Well, there are several reasons. The main one is that the modifications that need to be done to the vcf are considerable and manipulating/editing vcf files in R hasn't been easy. Among others, the output vcf files do not pass the vcf-validator as header and info fields are wrongly written. Also, there are several calculations I further do to these, such as homopolymer info, GC content and others which are easier to do from vectors/dataframes than with a vcf object, as VariantAnnotator generates. Finally, since I work with colleagues not familiar with informatics/programming whatsoever, they need those Excel files to see it clearly so the script was initially done for them. Generating a new script for this specific annotation would take a lot of time and effort that I believe is not necessary if a vcf can be generated from excel file, considering that all previous analysis has a script already set up.
Hello daianagan,
I agreed with the others here, that using
vcf
from beginning until end would be the more cleanly way. So we could try to find such way, if you tell us more (with examples) about your goal, input and desired output.If you really, really need a way to convert
excel
tovcf
you must show us how thisexcel
file looks like. At all, the starting point will be to save this file as acsv
, because parsing a real excel file is a nightmare.fin swimmer
OP, Like finswimmer says, converting Excel data back to VCF is a nightmare because of all the idiosyncratic and indescribable random problems that show up because Excel thinks it understands our data better than we do.
If your Excel-o-phile folks would like to visualize variants using Excel, they are welcome to, but they should communicate changes to you in a verbose fashion and you should translate those changes to reproducible
bcftools
/vcftools
/bedops
/bedtools
/whatever-tool commands that you can document and reuse.In addition to that, try to get those Excel-o-phile folks hooked on a GUI for VCF parsing. Those exist :)
Hi daianagan,
There is no need to delete questions, especially not after people have tried to help you.
Cheers,
Wouter
Sorry, didn't mean to delete it, meant to close it, since the main question "is it possible" meaning, is there a straightforward way of doing so, has already been answered. Thank you!
what is the source of germline variants? another text file?
Yes! It is the output of the single analysis of the normal sample.
you can annotate vcf with somatic variants with another file. I guess you will have matching information in germ line text. With that information, you can annotate your VCF with germ line variants text. Refer to bcftools annotate function. If you can post the example format of your germ line file and the corresponding vcf record, that would be helpful.