How to perform liftover from 38 to 37 in R?
0
0
Entering edit mode
10 months ago
DN99 ▴ 20

I have some gwas summary statistics in GRCh38 that I want to lift to GRCh37. I am trying to liftover in R using this code:

library(tidyverse)
library(magrittr)
library(data.table)
library(rtracklayer)
library(GenomicRanges)

rm(list=ls())

gwas_data <- fread("/gwas_sumstats_allchr.txt")
chain_file <- "/chain_files/hg38ToHg19.over.chain"
chain <- import.chain(chain_file)

# Convert to GRanges object (assuming GENPOS is 1-based)
gwas_ranges <- GRanges(
  seqnames = Rle(gwas_data$CHROM),
  ranges = IRanges(start = gwas_data$GENPOS - 1, end = gwas_data$GENPOS) # Convert to 0-based start
)

gwas_ranges$seqnames <- paste0("chr", gwas_ranges$seqnames)

# Perform the liftover
liftover_output <- liftOver(gwas_ranges, chain)

# Extract successful liftover
liftover_success <- liftover_output$`output`

# Convert the results back to a data frame (convert positions back to 1-based)
result <- data.frame(
  CHROM = as.character(seqnames(liftover_success)),
  GENPOS = start(liftover_success) + 1  # Convert back to 1-based position
)

# Print or save the result
print(head(result))

However this doesn't work and I am getting:

> liftover_output <- liftOver(gwas_ranges, chain)
Discarding unchained sequences: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1, 20, 21, 22, 2, 3, 4, 5, 6, 7, 8, 9

My chain file seems fine on checking it, and my gwas data is indeed in GRCh38 - what else could be going wrong?

My gwas_ranges looks like:

> head(gwas_ranges)
GRanges object with 6 ranges and 1 metadata column:
seqnames      ranges strand |    seqnames
<Rle>   <IRanges>  <Rle> | <character>
[1]       10 10708-10709      * |         chr
[2]       10 10904-10905      * |         chr
[3]       10 10942-10943      * |         chr
[4]       10 12112-12113      * |         chr
[5]       10 13850-13851      * |         chr
[6]       10 14537-14538      * |         chr
-------   seqinfo: 22 sequences from an unspecified genome; no seqlengths

I also tried doing the liftover in python using hail. This ran without error, but the liftover of SNPs on chromosome 10 position 1234 would lift to chromosome 18 12369 . Here's my hail attempt:

import hail as hl
hl.stop()
hl.init()

# Load the reference genomes and add the liftover chain file
rg38 = hl.get_reference("GRCh38")
rg37 = hl.get_reference("GRCh37")
local_chain_file = "/references_grch38_to_grch37.over.chain.gz"
rg38.add_liftover(local_chain_file, rg37)

# Import GWAS data
gwas = hl.import_table('/gwas_sumstats_allchr.txt', 
                       types={'CHROM': 'str', 'GENPOS': 'int'})

# Standardize chromosome names in your GWAS data to match GRCh38
gwas = gwas.annotate(CHROM=hl.if_else(hl.literal('chr').contains(gwas.CHROM), gwas.CHROM, hl.literal('chr') + gwas.CHROM))

# Perform the liftover from GRCh38 to GRCh37
gwas = gwas.annotate(
    locus_grch38=hl.locus(gwas.CHROM, gwas.GENPOS, reference_genome='GRCh38'),
    locus_grch37=hl.liftover(hl.locus(gwas.CHROM, gwas.GENPOS, reference_genome='GRCh38'), 'GRCh37')
)

# Add new columns for CHROM_38 and GENPOS_38
gwas = gwas.annotate(
    CHROM_38=hl.if_else(hl.is_defined(gwas.locus_grch37), gwas.locus_grch37.contig, hl.null('str')),
    GENPOS_38=hl.if_else(hl.is_defined(gwas.locus_grch37), gwas.locus_grch37.position, hl.null('int'))
)

gwas = gwas.drop('locus_grch38', 'locus_grch37')

# Export the results
gwas.export('/gwas_sumstats_allchr_GRCh37.txt')

Here's some of the output I get from the hail run:

#Head of gwas_sumstats_GRCh38txt:
CHROM   GENPOS  ID  ALLELE0 ALLELE1 A1FREQ  INFO    N   TEST    BETA    SE  CHISQ   LOG10P  p   MAF
10  10709   10:10709:G:A    G   A   0.0140284   0.420904    56210   ADD 0.143464    0.0384744   13.904  3.71581 0.000192393324862147    0.0140284
10  10905   10:10905:G:A    G   A   0.0352519   0.418554    56210   ADD -0.000990946    0.0246439   0.0016169   0.0141582   0.967925206925404   0.0352519
10  10943   10:10943:G:C    G   C   0.0653494   0.469298    56210   ADD -0.0130032  0.0173743   0.56013 0.342745    0.454208230910946   0.0653494
10  12113   10:12113:AC:A   AC  A   0.0390721   0.431839    56210   ADD 0.0386709   0.0230572   2.81293 1.02915 0.0935082652260488  0.0390721

#Head of gwas_sumstats_GRCh37.txt:
CHROM   GENPOS  ID  A1FREQ  INFO    N   TEST    BETA    SE  CHISQ   LOG10P  p   MAF original_locus
18  10905   18:10905:G:A    0.0140284   0.420904    56210   ADD 0.143464    0.0384744   13.904  3.71581 0.000192393324862147    0.0140284   chr10:10709
18  11258   18:11258:G:A    0.0352519   0.418554    56210   ADD -0.000990946    0.0246439   0.0016169   0.0141582   0.967925206925404   0.0352519   chr10:10905
18  11296   18:11296:G:C    0.0653494   0.469298    56210   ADD -0.0130032  0.0173743   0.56013 0.342745    0.454208230910946   0.0653494   chr10:10943
18  12467   18:12467:AC:A   0.0390721   0.431839    56210   ADD 0.0386709   0.0230572   2.81293 1.02915 0.0935082652260488  0.0390721   chr10:12113
R genomics hail liftover gwas • 1.2k views
ADD COMMENT
0
Entering edit mode

check the chromosomes in your file are the same than those in the chain file (eg '1' vs 'chr1')

ADD REPLY
0
Entering edit mode

Thank you for your reply, I made sure that everything is with "chr" to align with the chain file and that's what I'm trying the above code with and getting the above error. I've just tried the opposite and making the chain be '1' like my sum stats but that gives the same error.

ADD REPLY

Login before adding your answer.

Traffic: 1970 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6