Liftover vcf file from hs37d5 assembly to b37 assembly
0
0
Entering edit mode
10 weeks ago
nhaus ▴ 60

Hello,

I have a vcf file which consists of mutations that was generated using the GATK variant calling workflow. For this the hs37d5 assembly was used. The problem is, that all GAKT reference resources use the b37 assembly, and if I simply use them, my script fails, because for some variants (less than 0.1%) there is a mismatch between the b37 and hs37d5 reference genome. So my idea was to simply remap the variants of the VCF file to b37. I planed on using something like CrossMap, but no chain files are available for my reference assemblies.

Does anyone have an idea how I can remap the variants from my hs37d5 vcf file to the b37 assembly without the use of chain files, or any other suggestions?

I would greatly appreciate them!

Cheers

vcf liftover assembly • 288 views
0
Entering edit mode

don't you just have to rename the chromosomes (if needed) and discard the chromosomes that are not present in the other reference ?

0
Entering edit mode

Unfortunately not... Very rarely, the also differ in the nucleotide sequence. But because I am working with WGS, these events do occur and causes my script to crash, because the "REF" in my VCF file does not match the "REF" of my provided genome assembly.

My idea was to use a simple python script to manually change the REF nucleotides where a mismatch occurs, but it feels kinda wrong to manually change nucleotides...

0
Entering edit mode

Unfortunately not... Very rarely,

hs37d5 : Includes data from GRCh37, the rCRS mitochondrial sequence, Human herpesvirus 4 type 1 and the concatenated decoy sequences.

b37: includes data from GRCh37, the rCRS mitochondrial sequence, and the Human herpesvirus 4 type 1.

0
Entering edit mode

Could you elaborate what you mean? I also thought that they share the same sequence for the autosomes, but there are definitely some positions where they differ (at least the ones that I am using). I also found this on the GATK page:

For b37:

These alterations largely consist of contig name changes, however there are known sequence differences on some contigs as well.

Traffic: 2190 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.