Question: CrossMap VCF liftover (GRCh37 > GRCh38): END info field for indel calls not lifted over
0
gravatar for mrodrigues.fernanda
9 months ago by
United States / Saint Louis / Washington University in Saint Louis
mrodrigues.fernanda40 wrote:

Dear list,

I have used CrossMap to lift over vcf files from GRCh37 coordinates to GRCh38. Here is the command:

python CrossMap-0.2.8/bin/CrossMap.py vcf GRCh37_to_GRCh38.chain.gz sample1520.vcf.gz Homo_sapiens.GRCh38.dna.primary_assembly.fa sample1520.N.grch38.vcf

I noticed that for some of my indel calls, for which I have an END coordinate field in the vcf INFO, those coordinates are not lifted over, which is causing me issues in some downstream analyses (see example variant below).

2       240692368       .       AG      A       .       PASS    AC=1;AF=0.5;AN=2;DEL=.;END=241631786;HOMLEN=0;SVLEN=-1;set=indel        GT:AD   0/1:20,6

Is there a way to lift over that field as well? Is there any lift-over tool available which takes care of this issue? I tried UCSC liftover tool, but the same issue occurs.

Any help is appreciated. Thanks!

Fernanda

liftover crossmap • 529 views
ADD COMMENTlink modified 9 months ago by Brice Sarver3.5k • written 9 months ago by mrodrigues.fernanda40

Are you using the chain file from UCSC or your own? Also, have you checked to see whether the region you're trying to convert coordinates to is represented in the alignment used to generate your chain file and/or is not a region that is misassembled in 37 relative to 38 (i.e., perhaps this region is most accurately characterized on a patch scaffold and not the chromosome in 37)?

ADD REPLYlink written 9 months ago by Brice Sarver3.5k

I have downloaded the chain file appropriate for my case directly from the CrossMap website. I see your point, but I don't think that is the problem, since the problem is observed in all indel calls where an END field is present in the INFO field. The END field is never converted, only the variant position, which makes me think CrossMap does not process that information?

If I look at the grch37 vcf file used as input, the END fields remain the same, but the variant location is successfully converted.

ADD REPLYlink modified 9 months ago • written 9 months ago by mrodrigues.fernanda40
1

Sorry, I completely misread the question. You're trying to lift over a coordinate that's in the INFO field, not in the POS field (i.e., second column)? As far as I know, these tools aren't looking at the INFO field whatsoever as coordinates aren't usually there. However, you could extract those fields into a BED and convert the BED file, then re-insert them into the VCF.

ADD REPLYlink modified 9 months ago • written 9 months ago by Brice Sarver3.5k

Thank you for your response, Brice! Yes, for indels called by pindel, the stop coordinate for the variant is added to the INFO field. Because that is not lifted over, it was causing issues when running the converted VCF through tools like Gatk's VariantEval.

What you suggested sounds like a feasible solution. I will try that.

Thank you!

ADD REPLYlink written 9 months ago by mrodrigues.fernanda40

Great! I'll go ahead and copy/paste my comment below so it can be accepted as an answer.

ADD REPLYlink written 9 months ago by Brice Sarver3.5k
1
gravatar for Brice Sarver
9 months ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

Copy/paste from above:

Sorry, I completely misread the question. You're trying to lift over a coordinate that's in the INFO field, not in the POS field (i.e., second column)? As far as I know, these tools aren't looking at the INFO field whatsoever as coordinates aren't usually there. However, you could extract those fields into a BED and convert the BED file, then re-insert them into the VCF.

ADD COMMENTlink written 9 months ago by Brice Sarver3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1193 users visited in the last hour