Liftover vcf file from hs37d5 assembly to b37 assembly
0
1
Entering edit mode
2.9 years ago
nhaus ▴ 300

Hello,

I have a vcf file which consists of mutations that was generated using the GATK variant calling workflow. For this the hs37d5 assembly was used. The problem is, that all GAKT reference resources use the b37 assembly, and if I simply use them, my script fails, because for some variants (less than 0.1%) there is a mismatch between the b37 and hs37d5 reference genome. So my idea was to simply remap the variants of the VCF file to b37. I planed on using something like CrossMap, but no chain files are available for my reference assemblies.

Does anyone have an idea how I can remap the variants from my hs37d5 vcf file to the b37 assembly without the use of chain files, or any other suggestions?

I would greatly appreciate them!

Cheers

vcf liftover assembly • 1.8k views
ADD COMMENT
0
Entering edit mode

don't you just have to rename the chromosomes (if needed) and discard the chromosomes that are not present in the other reference ?

ADD REPLY
0
Entering edit mode

Unfortunately not... Very rarely, the also differ in the nucleotide sequence. But because I am working with WGS, these events do occur and causes my script to crash, because the "REF" in my VCF file does not match the "REF" of my provided genome assembly.

My idea was to use a simple python script to manually change the REF nucleotides where a mismatch occurs, but it feels kinda wrong to manually change nucleotides...

ADD REPLY
0
Entering edit mode

Unfortunately not... Very rarely,

https://cloud.google.com/life-sciences/docs/resources/public-datasets/reference-genomes

hs37d5 : Includes data from GRCh37, the rCRS mitochondrial sequence, Human herpesvirus 4 type 1 and the concatenated decoy sequences.

b37: includes data from GRCh37, the rCRS mitochondrial sequence, and the Human herpesvirus 4 type 1.

ADD REPLY
0
Entering edit mode

Could you elaborate what you mean? I also thought that they share the same sequence for the autosomes, but there are definitely some positions where they differ (at least the ones that I am using). I also found this on the GATK page:

For b37:

These alterations largely consist of contig name changes, however there are known sequence differences on some contigs as well.

ADD REPLY
0
Entering edit mode

I am in a similar predicament and am trying to make my own liftover for hs37d5 to b37. I'm wondering if you had any luck resolving your issue. Here are the steps I'm planning to follow - UCSC Liftover Instructions

ADD REPLY

Login before adding your answer.

Traffic: 2585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6