Liftover vcf file from hs37d5 assembly to b37 assembly
0
0
Entering edit mode
10 weeks ago
nhaus ▴ 60

Hello,

I have a vcf file which consists of mutations that was generated using the GATK variant calling workflow. For this the hs37d5 assembly was used. The problem is, that all GAKT reference resources use the b37 assembly, and if I simply use them, my script fails, because for some variants (less than 0.1%) there is a mismatch between the b37 and hs37d5 reference genome. So my idea was to simply remap the variants of the VCF file to b37. I planed on using something like CrossMap, but no chain files are available for my reference assemblies.

Does anyone have an idea how I can remap the variants from my hs37d5 vcf file to the b37 assembly without the use of chain files, or any other suggestions?

I would greatly appreciate them!

Cheers

vcf liftover assembly • 288 views
ADD COMMENT
0
Entering edit mode

don't you just have to rename the chromosomes (if needed) and discard the chromosomes that are not present in the other reference ?

ADD REPLY
0
Entering edit mode

Unfortunately not... Very rarely, the also differ in the nucleotide sequence. But because I am working with WGS, these events do occur and causes my script to crash, because the "REF" in my VCF file does not match the "REF" of my provided genome assembly.

My idea was to use a simple python script to manually change the REF nucleotides where a mismatch occurs, but it feels kinda wrong to manually change nucleotides...

ADD REPLY
0
Entering edit mode

Unfortunately not... Very rarely,

https://cloud.google.com/life-sciences/docs/resources/public-datasets/reference-genomes

hs37d5 : Includes data from GRCh37, the rCRS mitochondrial sequence, Human herpesvirus 4 type 1 and the concatenated decoy sequences.

b37: includes data from GRCh37, the rCRS mitochondrial sequence, and the Human herpesvirus 4 type 1.

ADD REPLY
0
Entering edit mode

Could you elaborate what you mean? I also thought that they share the same sequence for the autosomes, but there are definitely some positions where they differ (at least the ones that I am using). I also found this on the GATK page:

For b37:

These alterations largely consist of contig name changes, however there are known sequence differences on some contigs as well.

ADD REPLY

Login before adding your answer.

Traffic: 2190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6