Question: How to confirm if VCF file is normalized and Left aligned ?
0
gravatar for nobody
6 weeks ago by
nobody0
earth
nobody0 wrote:

Hi Team,

As I'm a newbie my question might be very lame, so please bear with me ...

I was told that if we find a multi-allelic entry in our VCF file than it means that it is not normalized, and thus left-aligned ? ( multiple values under ALT column , in our case T,A and A,T)

I found below entry (modifying/removing some values to make data anonymous ) in the VCF files I downloaded from UKBB.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE_ID
chrMM 5060XXXX      chrMM_5060XXXX_G_T;chrMM_5060XXXX_G_A   G       T,A     61      PASS    AF=0.04752,2e-06;AQ=61,38;AC=0,0;AN=2   GT:DP:AD:GQ:PL:RNC      0/0:14:14,0,0:42:0,...
chrMM 5060YYYY      chrMM_5060YYYY_G_A;chrMM_5060YYYY_G_T   G       A,T     49      PASS    AF=5e-06,2e-06;AQ=49,38;AC=0,0;AN=2     GT:DP:AD:GQ:PL:RNC      0/0:16:16,0,0:48:0,...

Does this mean the VCF file is not normalized ?

I went into this rabbit hole as VEP tool didn't return any annotation and "internet" told me one possible reason could be that files are not normalized/left aligned ...

May someone please confirm some approach via which I can check if VCF files are normalized or not ?

Thanks again team ...

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by nobody0
1

Mullti-allelics do not equal non-normalized VCF, although splitting multi-allelics and representing variants in left-aligned parsimonious co-ordinates do go together frequently.

Checking if a VCF is normalized is not an operation worth doing, you'd be better off just running a normalization tool. I'd recommend vt. I'd definitely recommend at least splitting multi-allelic entries before annotation. With VEP, check online using an entry that you think should be annotated and compare the command line shown there with the command you're running to debug the lack of annotation.

While bcftools norm gives the illusion that left aligning variants and splitting multi-allelic entries is part of the same process, it is not. Most SNVs won't be affected by the former process, for example.

I recommend vt because it retains a record of the changes it makes in-file, serving as a log. It adds INFO/OLD_MULTIALLELIC and INFO/OLD_VARIANT entries so you can filter down to variants that were changed by your operations. See https://genome.sph.umich.edu/wiki/Vt#Decompose and https://genome.sph.umich.edu/wiki/Vt#Normalization

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Ram32k

Thanks @_r_ram for your enlightening response, much appreciated ...

tried installing the vt tool but seeing this error : https://github.com/atks/vt/issues/113

can't use Conda in my enviornment .. :(

But then what is life without some struggle ? :D

ADD REPLYlink written 6 weeks ago by nobody0

Please don't add answers unless you're answering the top level question. Use Add Comment or Add Reply instead. Now that that's out of the way, what kind of machine are you working on - a local machine such as a laptop/desktop or a HPC cluster? If it's the latter, contact your sysadmin.

Why can you not use the conda workaround? It does not need super user privileges.

ADD REPLYlink written 6 weeks ago by Ram32k

Thanks again @ _r_am I see your point about adding reply or comment !!! will be more cautious next time ... its a cluster and sysadmin doesn't allow usage of Conda ...

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by nobody0

sysadmin doesn't allow usage of Conda

Are you sure of that? Maybe you are confusing conda with Docker, this latter requiring admin privileges to install and it is sometimes frowned upon by sys admins. I don't mean to encourage a sneaky behavior but have you tried installing conda? If so, what errors did you encounter?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by dariober11k

Thanks dariober for the comment ... I didn't even attempt to run conda as our admin says ... "We request our users to please not install Anaconda on the clusters" I started the normalization step via bcftools ... it takes care of the decomposition scenario, will surely try out the vt tool when issue is resolved ... it seems like a good weapon to have in one's arsenal

ADD REPLYlink written 6 weeks ago by nobody0

bcftools also does left alignment. If your sysadmin won't allow conda, ask them to install vt. They should be open to doing that.

Plus, does your sysadmin have a problem with conda or Anaconda? Anaconda is a bulky package, you can work with miniconda, which is a much slimmer tool. Plus, if your sysadmin is being this pain because they are trying to control what binaries gets run on the cluster, that's just them being unreasonable. Ask them why they don't want conda, and whether their problem lies with conda or anaconda.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Ram32k

I guess its more about "control" than any other reason ... I will ask if vt can be installed, and also if mini-condo can be installed...
but I'm very surprised with the issue with vt ... I checked the current version of code on GitHub and they are calling make for a file that doesn't exists in the codebase ... but probably I'm missing something as I won't expect author of such an awesome tool to commit without building ...or may be he/she used Conda to test the build ... regardless ... I thank you both for taking out time and to respond ... I'm already in love with Biostar community :)

Have a great one team !!!

ADD REPLYlink written 5 weeks ago by nobody0
1

Again about conda and your sysadmins... Recently I've become quite a fan of conda and while it has its problems and critics I'm in no way looking back at when I was installing stuff in various /bin/ directories - a total mess when working on several projects across various servers and years!

If I were you, I would investigate further with IT to see if they have a valid reason to refuse conda (not be confused with anaconda!) and see if you can resolve it. If they do have a valid reason, I'd like to know what that is. One of the nice things of conda is that, in contrast to Docker, it's all self contained within the user space so you shouldn't even be able to annoy other users and if something goes wrong just delete the conda environment and start again.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by dariober11k

author of such an awesome tool to commit without building

Everyone is human, and people can make mistakes. But sure, the community would have caught on. What are you referring to when you say there is a make target that doesn't exist? I have a feeling that maybe you're misreading the make file.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Ram32k

lib/utils.c file doesn't exists in the lib folder ... https://github.com/atks/vt/tree/master/lib/libdeflate/lib

which we refer in the Makefile of libdeflate LIB_SRC := lib/deflate_decompress.c lib/utils.c \ $(wildcard lib/*/cpu_features.c)

and we get the following error while building

cd lib/libdeflate; make || exit 1; 
make[1]: Entering directory '/data/sigven/software/vt/lib/libdeflate'
CC lib/deflate_decompress.o
make[1]: *** No rule to make target 'lib/utils.c', needed by 'lib/utils.o'. Stop.           <--- 
make[1]: Leaving directory '/data/sigven/software/vt/lib/libdeflate'
make: *** [Makefile:151: lib/libdeflate/libdeflate.a] Error 1
ADD REPLYlink written 5 weeks ago by nobody0
1

It could be an untested change, I think. I can't see a different explanation, but if conda is not an option, try switching to an earlier commit and building it. Maybe this one's a stable working commit: https://github.com/atks/vt/tree/88da43649b5a39ddfc00d8a8f4d494fad50d5eec

See this SO answer on how to switch to a custom commit: https://stackoverflow.com/a/7832839/1394178

ADD REPLYlink written 5 weeks ago by Ram32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1293 users visited in the last hour
_