Get gVCF file from genome coordinates
1
0
Entering edit mode
5.9 years ago
Paul ★ 1.4k

Dear all,

I have bed file like:

Chr \t start \t stop


And I woud like to add to fourth column information about reference allele from hg19, so output for example should look like:

chr \t start \t stop \t A/A
chr \t start \t stop \t C/C
chr \t start \t stop \t T/T
chr \t start \t stop \t A/A


So create something very similar to gVCF. Reason is, that I need to annotate this output file in VEP.

gVCF coorinates hg19 • 1.4k views
0
Entering edit mode

Do you know a bit of Python? I have a script which almost does what you want.

0
Entering edit mode

Thank you for reply. I am scripting more in bash and awk.. But maybe I would understand..

0
Entering edit mode

How big is your file? Wondering whether to do everything in memory or stream.

0
Entering edit mode

1
Entering edit mode
5.8 years ago

This should do the trick, let me know if it doesn't :) The script uses as arguments i) the file to add reference nucleotides to ii) a directory containing single chromosome fasta files named like chr1.fa. Obviously you can change this

Let me know if something is not as it should be.

0
Entering edit mode

# WouterDeCoster Thank you for very nice code. I am trying to run thi code but still it failed. My syntax:

python anotate.py input.tsv  path/to/chr/directory


And I have error message:

File "anotate.py", line 57 addrefnucl(key) ^ IndentationError: expected an indented block

Please if you have any idea what I am doing wrong. Thank you so much.

0
Entering edit mode

There is a tab error, maybe what I wrote or copy paste. I'm at a conference and will look into this asap.

0
Entering edit mode

If that's the complete error message, you did something weird while copying the file. Could you check that the final line is properly indented with a tab (as in my post above). If there is a longer error message, I would like to see that.