Using LiftOver to change genomic build
3
1
Entering edit mode
13 months ago
Karan ▴ 10

Hi, all -

Two questions about using LiftOver:

  1. The .bed file changes after using LiftOver. Correct me if I'm wrong, but I can just use the .bim and .fam file from before LiftOver as those do not change?
  2. I have used LiftOver to migrate the POPRES dataset from hg18 to hg38. Initially, I received an error saying I needed three fields for liftOver to work. To put my .bed file into UCSC format, I ran the following awk script:

    grep -v '^#' POPRES_Genotypes_QC2_v2_VCF.vcf | awk -F '\t' '{print $1,$2,$2,$3}' > output1.ucsc.bed

When I run LiftOver aftwards, though, all my files go into the unlifted pile, and none are actually lifted. What could be the issue?

hg18 genome-build LiftOver hg38 • 2.1k views
ADD COMMENT
2
Entering edit mode
13 months ago

If you are lifting over a VCF, the simplest way to do it is to use BCFtools +liftover that can be obtained here. You can run:

bcftools +liftover \
  POPRES_Genotypes_QC2_v2_VCF.vcf \
  -- \
  -s hg18.fa \
  -f hg38.fa \
  -c hg18ToHg38.over.chain.gz

Notice that this will recover a few more SNPs than the UCSC liftOver, as using the .bed file approach will land some SNPs in gaps of the chain file. When using BCFtools +liftover small gaps in the chain file are handled and more SNPs are recovered. This is particularly relevant when SNP differences between genome builds are the exact reason for small gaps in the chain file. Furthermore BCFtools +liftover will handle VCF tags such as INFO/AC and INFO/AF that will need to be updated when a reference⇆alternate allele swap is required

ADD COMMENT
0
Entering edit mode
13 months ago
barslmn ★ 2.1k

Liftover is process of mapping positions from one assembly version to another for the same organism. So, this will change the positions and chromosomes.

The .bed file changes after using LiftOver. Correct me if I'm wrong, but I can just use the .bim and .fam file from before LiftOver as those do not change?

You should take a look at the plink documentation for the file formats. Especially, to bed format since it's different from UCSC bed format. https://www.cog-genomics.org/plink/2.0/formats#bed . See what is in these files and what fields are different after you did the liftover.

I have used LiftOver to migrate the POPRES dataset from hg18 to hg38. Initially, I received an error saying I needed three fields for liftOver to work. To put my .bed file into UCSC format, I ran the following awk script:

grep -v '^#' POPRES_Genotypes_QC2_v2_VCF.vcf | awk -F '\t' '{print $1,$2,$2,$3}' > output1.ucsc.bed

Liftover expects a range that covers a base at least, it should extend your positions but it doesn't recommend using liftover for converting SNPs positions.

ADD COMMENT
0
Entering edit mode
13 months ago

Not relevant for SNPs, but I wanted to add liftoff for completeness. It will liftover GTF GFF3 files without any chain files needed. It works well in my experience.

https://github.com/agshumate/Liftoff

ADD COMMENT

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6