Entering edit mode
10 months ago
DN99
▴
20
I have some gwas summary statistics in GRCh38 that I want to lift to GRCh37. I am trying to liftover in R using this code:
library(tidyverse)
library(magrittr)
library(data.table)
library(rtracklayer)
library(GenomicRanges)
rm(list=ls())
gwas_data <- fread("/gwas_sumstats_allchr.txt")
chain_file <- "/chain_files/hg38ToHg19.over.chain"
chain <- import.chain(chain_file)
# Convert to GRanges object (assuming GENPOS is 1-based)
gwas_ranges <- GRanges(
seqnames = Rle(gwas_data$CHROM),
ranges = IRanges(start = gwas_data$GENPOS - 1, end = gwas_data$GENPOS) # Convert to 0-based start
)
gwas_ranges$seqnames <- paste0("chr", gwas_ranges$seqnames)
# Perform the liftover
liftover_output <- liftOver(gwas_ranges, chain)
# Extract successful liftover
liftover_success <- liftover_output$`output`
# Convert the results back to a data frame (convert positions back to 1-based)
result <- data.frame(
CHROM = as.character(seqnames(liftover_success)),
GENPOS = start(liftover_success) + 1 # Convert back to 1-based position
)
# Print or save the result
print(head(result))
However this doesn't work and I am getting:
> liftover_output <- liftOver(gwas_ranges, chain)
Discarding unchained sequences: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1, 20, 21, 22, 2, 3, 4, 5, 6, 7, 8, 9
My chain file seems fine on checking it, and my gwas data is indeed in GRCh38 - what else could be going wrong?
My gwas_ranges looks like:
> head(gwas_ranges)
GRanges object with 6 ranges and 1 metadata column:
seqnames ranges strand | seqnames
<Rle> <IRanges> <Rle> | <character>
[1] 10 10708-10709 * | chr
[2] 10 10904-10905 * | chr
[3] 10 10942-10943 * | chr
[4] 10 12112-12113 * | chr
[5] 10 13850-13851 * | chr
[6] 10 14537-14538 * | chr
------- seqinfo: 22 sequences from an unspecified genome; no seqlengths
I also tried doing the liftover in python using hail. This ran without error, but the liftover of SNPs on chromosome 10 position 1234 would lift to chromosome 18 12369 . Here's my hail attempt:
import hail as hl
hl.stop()
hl.init()
# Load the reference genomes and add the liftover chain file
rg38 = hl.get_reference("GRCh38")
rg37 = hl.get_reference("GRCh37")
local_chain_file = "/references_grch38_to_grch37.over.chain.gz"
rg38.add_liftover(local_chain_file, rg37)
# Import GWAS data
gwas = hl.import_table('/gwas_sumstats_allchr.txt',
types={'CHROM': 'str', 'GENPOS': 'int'})
# Standardize chromosome names in your GWAS data to match GRCh38
gwas = gwas.annotate(CHROM=hl.if_else(hl.literal('chr').contains(gwas.CHROM), gwas.CHROM, hl.literal('chr') + gwas.CHROM))
# Perform the liftover from GRCh38 to GRCh37
gwas = gwas.annotate(
locus_grch38=hl.locus(gwas.CHROM, gwas.GENPOS, reference_genome='GRCh38'),
locus_grch37=hl.liftover(hl.locus(gwas.CHROM, gwas.GENPOS, reference_genome='GRCh38'), 'GRCh37')
)
# Add new columns for CHROM_38 and GENPOS_38
gwas = gwas.annotate(
CHROM_38=hl.if_else(hl.is_defined(gwas.locus_grch37), gwas.locus_grch37.contig, hl.null('str')),
GENPOS_38=hl.if_else(hl.is_defined(gwas.locus_grch37), gwas.locus_grch37.position, hl.null('int'))
)
gwas = gwas.drop('locus_grch38', 'locus_grch37')
# Export the results
gwas.export('/gwas_sumstats_allchr_GRCh37.txt')
Here's some of the output I get from the hail run:
#Head of gwas_sumstats_GRCh38txt:
CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ INFO N TEST BETA SE CHISQ LOG10P p MAF
10 10709 10:10709:G:A G A 0.0140284 0.420904 56210 ADD 0.143464 0.0384744 13.904 3.71581 0.000192393324862147 0.0140284
10 10905 10:10905:G:A G A 0.0352519 0.418554 56210 ADD -0.000990946 0.0246439 0.0016169 0.0141582 0.967925206925404 0.0352519
10 10943 10:10943:G:C G C 0.0653494 0.469298 56210 ADD -0.0130032 0.0173743 0.56013 0.342745 0.454208230910946 0.0653494
10 12113 10:12113:AC:A AC A 0.0390721 0.431839 56210 ADD 0.0386709 0.0230572 2.81293 1.02915 0.0935082652260488 0.0390721
#Head of gwas_sumstats_GRCh37.txt:
CHROM GENPOS ID A1FREQ INFO N TEST BETA SE CHISQ LOG10P p MAF original_locus
18 10905 18:10905:G:A 0.0140284 0.420904 56210 ADD 0.143464 0.0384744 13.904 3.71581 0.000192393324862147 0.0140284 chr10:10709
18 11258 18:11258:G:A 0.0352519 0.418554 56210 ADD -0.000990946 0.0246439 0.0016169 0.0141582 0.967925206925404 0.0352519 chr10:10905
18 11296 18:11296:G:C 0.0653494 0.469298 56210 ADD -0.0130032 0.0173743 0.56013 0.342745 0.454208230910946 0.0653494 chr10:10943
18 12467 18:12467:AC:A 0.0390721 0.431839 56210 ADD 0.0386709 0.0230572 2.81293 1.02915 0.0935082652260488 0.0390721 chr10:12113
check the chromosomes in your file are the same than those in the chain file (eg
'1'
vs'chr1'
)Thank you for your reply, I made sure that everything is with "chr" to align with the chain file and that's what I'm trying the above code with and getting the above error. I've just tried the opposite and making the chain be '1' like my sum stats but that gives the same error.