Question: LiftOver a VCF file
0
gravatar for Susmita Mandal
2.4 years ago by
Bangalore
Susmita Mandal70 wrote:

Hey all,

I have been trying to liftover a particular VCF file from GRCm38 to NCBIm37. I have used UCSC LiftOver tool, Ensembl API, CrossMap and Picard. None of them are lifting over completely. Either they are not working at all or having rejected variants. Especially in Picard LiftoverVcf, the rejected variants are those with have NoTarget in them. No idea why. The reference fasta file I am using is Mus_musculus.NCBIM37.61.dna.toplevel.fa. and the liftover chain file is GRCm38_to_NCBIM37.chain.gz

The vcf file is from:

ftp://ftp-mouse.sanger.ac.uk/current_snps/strain_specific_vcfs/129S1_SvImJ.mgp.v5.snps.dbSNP142.vcf

Any leads will be helpul. Thanks in advance

Best,

Susmita

picard crossmap liftovervcf vcf • 2.6k views
ADD COMMENTlink modified 2.4 years ago by Ram32k • written 2.4 years ago by Susmita Mandal70

Especially in Picard LiftoverVcf, the rejected variants are those with have NoTarget in them

what is 'NoTarget' ?

ADD REPLYlink written 2.4 years ago by Pierre Lindenbaum134k

I have no idea. Its something like this: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129S1_SvImJ 1 3000023 . C A 109 NoTarget CSQ=A||||intergenic_variant||||||||;DP=6;DP4=0,0,6,0 GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI 1/1:22:6:0.166667:152,22,0:137,18,0:2:36:6:0,0,6,0:0:-0.616816:.:1

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Susmita Mandal70

it's the FILTER column , and should be defined in the VCF header...

ADD REPLYlink written 2.4 years ago by Pierre Lindenbaum134k

I just want the VCF to be lifted over!

ADD REPLYlink written 2.4 years ago by Susmita Mandal70

searching online would help (key words: picard liftover notarget in google): https://github.com/broadinstitute/picard/blob/master/src/main/java/picard/vcf/LiftoverVcf.java

     * Filter name to use when a target cannot be lifted over.
     */
public static final String FILTER_NO_TARGET = "NoTarget";
ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by cpad011215k

NoTarget is not the main issue. Issue is why theVCF is not getting lifted completely. Is there any tool that can do?

ADD REPLYlink written 2.4 years ago by Susmita Mandal70

VCF from the link posted in OP is huge and gzipped vcf is ~200 mb (on http://crispor.tefor.net/genomes/mm10/orig/). It would help if you could post example records that are not lifted between the builds with headers. In general, there are always discrepancies between builds (vcf). some of the record get merged and some of the records get dropped. However this % is small, in consecutive builds.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by cpad011215k

I dont think I can copy that many lines here.

ADD REPLYlink written 2.4 years ago by Susmita Mandal70
2
gravatar for Emily_Ensembl
2.4 years ago by
Emily_Ensembl21k
EMBL-EBI
Emily_Ensembl21k wrote:

Genome assemblies do not 100% map to one another. Newer assemblies will have novel regions that were not found in the older assemblies, and older assemblies will have incorrectly assembled regions that cannot easily be mapped across to the correctly assembled regions. If the variants were called on loci in GRCm38 that did not have coverage in NCBIM37, then there will be no mapping.

Why do you want to map your VCFs back to an old assembly? Would it not be better to map your other data forward onto the newer assembly?

ADD COMMENTlink written 2.4 years ago by Emily_Ensembl21k

or use VCFs relevant to that build/assembly.

ADD REPLYlink written 2.4 years ago by cpad011215k

I took your advice. Earlier I couldn't find VCFs of 129S1/Sv mapped onto mm9. I found some files though last night, its a tab delimited file with #CHROM POS REF 129S1/Sv and a tbi file. I have no idea how to get the vcf file from these two. Any ideas?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Susmita Mandal70

I am creating custom in silico parental genomes and for that i would need VCFs from both the parents mapped onto the same reference genome

ADD REPLYlink written 2.4 years ago by Susmita Mandal70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1481 users visited in the last hour
_