Question: Generate VCF file from csv/excel file
0
gravatar for daianagan
2.5 years ago by
daianagan10
daianagan10 wrote:

Hello everyone! I am using a pipeline which performs tumor-normal matched analysis. As output, it gives the VCF and Excel files only for somatic variants, excluding germline.

As I need to have those germline variants in the file for further annotation, I have performed a script to add those absent germline variants in the excel file.

What I would need is to create a new vcf file from the data on that excel file. Is that possible?

I have been trying VariantAnnotator from Bioconductor and pysam and pyVCF in python. I couldn't find how to create one from scratch, but it reads already created vcf's. Any help, ideas, tips is more than welcome!

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by daianagan10
4

If you are starting with VCF and would like to end with a VCF, why are you creating the intermediate Excel files?

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by igor12k
1

Hi Igor! Well, there are several reasons. The main one is that the modifications that need to be done to the vcf are considerable and manipulating/editing vcf files in R hasn't been easy. Among others, the output vcf files do not pass the vcf-validator as header and info fields are wrongly written. Also, there are several calculations I further do to these, such as homopolymer info, GC content and others which are easier to do from vectors/dataframes than with a vcf object, as VariantAnnotator generates. Finally, since I work with colleagues not familiar with informatics/programming whatsoever, they need those Excel files to see it clearly so the script was initially done for them. Generating a new script for this specific annotation would take a lot of time and effort that I believe is not necessary if a vcf can be generated from excel file, considering that all previous analysis has a script already set up.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by daianagan10
1

Hello daianagan,

I agreed with the others here, that using vcf from beginning until end would be the more cleanly way. So we could try to find such way, if you tell us more (with examples) about your goal, input and desired output.

If you really, really need a way to convert excel to vcf you must show us how this excel file looks like. At all, the starting point will be to save this file as a csv, because parsing a real excel file is a nightmare.

fin swimmer

ADD REPLYlink written 2.5 years ago by finswimmer14k
2

OP, Like finswimmer says, converting Excel data back to VCF is a nightmare because of all the idiosyncratic and indescribable random problems that show up because Excel thinks it understands our data better than we do.

If your Excel-o-phile folks would like to visualize variants using Excel, they are welcome to, but they should communicate changes to you in a verbose fashion and you should translate those changes to reproducible bcftools/vcftools/bedops/bedtools/whatever-tool commands that you can document and reuse.

ADD REPLYlink written 2.5 years ago by _r_am32k
1

In addition to that, try to get those Excel-o-phile folks hooked on a GUI for VCF parsing. Those exist :)

ADD REPLYlink written 2.5 years ago by WouterDeCoster45k
1

Finally, since I work with colleagues not familiar with informatics/programming whatsoever, they need those Excel files to see it clearly

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum132k
2

Hi daianagan,

There is no need to delete questions, especially not after people have tried to help you.

Cheers,
Wouter

ADD REPLYlink written 2.5 years ago by WouterDeCoster45k

Sorry, didn't mean to delete it, meant to close it, since the main question "is it possible" meaning, is there a straightforward way of doing so, has already been answered. Thank you!

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by daianagan10
1

what is the source of germline variants? another text file?

ADD REPLYlink written 2.5 years ago by cpad011214k

Yes! It is the output of the single analysis of the normal sample.

ADD REPLYlink written 2.5 years ago by daianagan10
1

you can annotate vcf with somatic variants with another file. I guess you will have matching information in germ line text. With that information, you can annotate your VCF with germ line variants text. Refer to bcftools annotate function. If you can post the example format of your germ line file and the corresponding vcf record, that would be helpful.

ADD REPLYlink written 2.5 years ago by cpad011214k
0
gravatar for daianagan
2.5 years ago by
daianagan10
daianagan10 wrote:

Thank you everyone for your help. Apparently there is no straightforward way of doing so, which was my original question. I will further investigate on how adapt my scripts to avoid using Excel and try and find those GUI parsers for other members of my group. Thank you!

ADD COMMENTlink written 2.5 years ago by daianagan10

Create an iPython notebook template to compensate for GUI.

ADD REPLYlink written 2.5 years ago by Arup Ghosh2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1015 users visited in the last hour