Raising a KeyError when I know the key is in both files
0
0
Entering edit mode
17 months ago
hemr3 ▴ 10

Hi all,

I was hoping that I could get some help with why this python script (installed from https://ppp.readthedocs.io/en/latest/PPP_pages/Utilities/vcf_bed_to_seq.html#) is not working for me.

The program is intended to convert SNP data into sequence data, using a VCF or BED file with a reference FASTA file. As there is no reference Neanderthal FASTA file, the human one is used.

The command to get this program to work is:

vcf_bed_to_seq.py --vcf neanderthal_file.vcf --model-file out.model --modelname 1Pop --fasta-reference GCF_000001405.25_GRCh37.p13_genomic.fna.gz --region 3:49828647-49848193

This raises the error:

KeyError: "sequence '3' not present"

To deal with this, I've converted both headings for the VCF and FASTA files to the same chromosome header (for CHR3): chromosome3. I did this with the command for both (changing based on what the original CHR header was):

sed -i 's/chromosome 3/chromosome3/g' new_neanderthal_file.vcf

However, it still raises the same error, instead with:

KeyError: "sequence 'chromosome3' not present"

All other files are correct (like the model file, and the model-name), so I know those aren't the issues. The issue is always raised in the same lines (492, 487, 303) in that order. This happens whether I use a .vcf or .bed file

The GR37 alignment is being used because this is what the original Neanderthal sequence was aligned to.

Could anyone help?

fasta python VCF • 300 views
ADD COMMENT

Login before adding your answer.

Traffic: 3443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6