Question: Generate VCF file from csv/excel file
0
gravatar for daianagan
18 months ago by
daianagan10
daianagan10 wrote:

Hello everyone! I am using a pipeline which performs tumor-normal matched analysis. As output, it gives the VCF and Excel files only for somatic variants, excluding germline.

As I need to have those germline variants in the file for further annotation, I have performed a script to add those absent germline variants in the excel file.

What I would need is to create a new vcf file from the data on that excel file. Is that possible?

I have been trying VariantAnnotator from Bioconductor and pysam and pyVCF in python. I couldn't find how to create one from scratch, but it reads already created vcf's. Any help, ideas, tips is more than welcome!

ADD COMMENTlink modified 18 months ago • written 18 months ago by daianagan10
4

If you are starting with VCF and would like to end with a VCF, why are you creating the intermediate Excel files?

ADD REPLYlink modified 18 months ago • written 18 months ago by igor9.2k
1

Hi Igor! Well, there are several reasons. The main one is that the modifications that need to be done to the vcf are considerable and manipulating/editing vcf files in R hasn't been easy. Among others, the output vcf files do not pass the vcf-validator as header and info fields are wrongly written. Also, there are several calculations I further do to these, such as homopolymer info, GC content and others which are easier to do from vectors/dataframes than with a vcf object, as VariantAnnotator generates. Finally, since I work with colleagues not familiar with informatics/programming whatsoever, they need those Excel files to see it clearly so the script was initially done for them. Generating a new script for this specific annotation would take a lot of time and effort that I believe is not necessary if a vcf can be generated from excel file, considering that all previous analysis has a script already set up.

ADD REPLYlink modified 18 months ago • written 18 months ago by daianagan10
1

Hello daianagan,

I agreed with the others here, that using vcf from beginning until end would be the more cleanly way. So we could try to find such way, if you tell us more (with examples) about your goal, input and desired output.

If you really, really need a way to convert excel to vcf you must show us how this excel file looks like. At all, the starting point will be to save this file as a csv, because parsing a real excel file is a nightmare.

fin swimmer

ADD REPLYlink written 18 months ago by finswimmer13k
2

OP, Like finswimmer says, converting Excel data back to VCF is a nightmare because of all the idiosyncratic and indescribable random problems that show up because Excel thinks it understands our data better than we do.

If your Excel-o-phile folks would like to visualize variants using Excel, they are welcome to, but they should communicate changes to you in a verbose fashion and you should translate those changes to reproducible bcftools/vcftools/bedops/bedtools/whatever-tool commands that you can document and reuse.

ADD REPLYlink written 18 months ago by RamRS25k
1

In addition to that, try to get those Excel-o-phile folks hooked on a GUI for VCF parsing. Those exist :)

ADD REPLYlink written 18 months ago by WouterDeCoster42k
1

Finally, since I work with colleagues not familiar with informatics/programming whatsoever, they need those Excel files to see it clearly

ADD REPLYlink written 18 months ago by Pierre Lindenbaum125k
2

Hi daianagan,

There is no need to delete questions, especially not after people have tried to help you.

Cheers,
Wouter

ADD REPLYlink written 18 months ago by WouterDeCoster42k

Sorry, didn't mean to delete it, meant to close it, since the main question "is it possible" meaning, is there a straightforward way of doing so, has already been answered. Thank you!

ADD REPLYlink modified 18 months ago • written 18 months ago by daianagan10
1

what is the source of germline variants? another text file?

ADD REPLYlink written 18 months ago by cpad011212k

Yes! It is the output of the single analysis of the normal sample.

ADD REPLYlink written 18 months ago by daianagan10
1

you can annotate vcf with somatic variants with another file. I guess you will have matching information in germ line text. With that information, you can annotate your VCF with germ line variants text. Refer to bcftools annotate function. If you can post the example format of your germ line file and the corresponding vcf record, that would be helpful.

ADD REPLYlink written 18 months ago by cpad011212k
0
gravatar for daianagan
18 months ago by
daianagan10
daianagan10 wrote:

Thank you everyone for your help. Apparently there is no straightforward way of doing so, which was my original question. I will further investigate on how adapt my scripts to avoid using Excel and try and find those GUI parsers for other members of my group. Thank you!

ADD COMMENTlink written 18 months ago by daianagan10

Create an iPython notebook template to compensate for GUI.

ADD REPLYlink written 18 months ago by arup1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour