Question: Remove variants that do not map to human genome
0
gravatar for john.michel.rouhana
3 months ago by
john.michel.rouhana10 wrote:

I received an hg38 VCF file that's had variants imputed with 1000 genomes. I've encountered some issues with the VCF; REF alleles that do not align to a reference genome, ALT alleles that do not appear to be reported anywhere in the literature, and, most recently, variants that flat-out do not align to the human genome (variants on chr19 with bp-pos 100 million+ when the whole chromosome is in the 50 million bp range).

I've worked out hack-y solutions to most of the issues that I've encountered, but this latest one has been an issue for me. I only detected these variants when I ran VEP and it flagged them as not mapping to the genome. As such, I'm more or less removing these variants one at a time using grep -v. I'd like a solution where I can just remove any variants from the vcf that appear to map to regions that do not exist in the human genome. Bonus points if the solution also encompasses some of the other issues I mentioned, although I think I've already found solutions to those. Is there anything out there that does this?

qc variant-calling vcf • 151 views
ADD COMMENTlink modified 3 months ago by Pierre Lindenbaum122k • written 3 months ago by john.michel.rouhana10

Hello john.michel.rouhana!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/8629/remove-variants-that-do-not-map-to-human-genome

This is typically not recommended as it uses the finite time of volunteers in both communities.

ADD REPLYlink modified 3 months ago • written 3 months ago by WouterDeCoster40k

I wasn't aware- thank you for making this apparent. I thought it'd make the most sense to post it in both locations. Thanks for the etiquette lesson. Is there any way to remove my post here?

ADD REPLYlink modified 3 months ago • written 3 months ago by john.michel.rouhana10
1

You received an answer already which is why we would restore a deleted post anyway out of respect for the user who invested time to answer. Don't worry, leave the question here but for the future, please consider not to cross-post as many users are active in both communities, avoiding double-efforts ;-)

ADD REPLYlink written 3 months ago by ATpoint21k
1

The minimal you could do is link both posts to each other, so contributors on forum A will find that someone has replied on forum B.

ADD REPLYlink written 3 months ago by WouterDeCoster40k
2
gravatar for Pierre Lindenbaum
3 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

(not tested)

bcftools norm -c x

with option

-c, --check-ref <e|w|x|s>         check REF alleles and exit (e), warn (w), exclude (x), or set (s) bad sites [e]
ADD COMMENTlink written 3 months ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour