convert genome coordinates from b36 to hg19
1
0
Entering edit mode
2.3 years ago
@anna • 0

Hi there,

I have an old bed and vcf files that were created using the genome reference b36. Is there any way to update the genome coordinates to GRCh37/hg19?

Many thanks in advance, Anna

coordinates genome b36 hg19 • 1.5k views
ADD COMMENT
0
Entering edit mode

ucsc liftover and picard liftovervcf

ADD REPLY
0
Entering edit mode

I tried with picard liftovervcf, but it gives me an empty file (it only has the header lines), while all variants go to the rejected_variants.vcf

java -jar picard.jar LiftoverVcf I=chr11.vcf O=chr11b.vcf CHAIN=hg18ToHg19.over.chain REJECT=rejected_variants.vcf R=ucsc.hg19.fasta

ADD REPLY
0
Entering edit mode

check the chromosome nomenclature (chr1 vs 1)

ADD REPLY
0
Entering edit mode

Yes, that was it. Silly mistake sorry

Still, the output file has 237.905 lines and the rejected_variants file has 436.954 lines, which seems a lot. Is it expectable?

It also gives me this warning: WARNING LiftoverVcf 137518 variants with a swapped REF/ALT were identified, but were not recovered.

ADD REPLY
0
Entering edit mode
2.3 years ago
vkkodali_ncbi ★ 3.7k

NCBI remap service can be used for this. Specifically, you can use this link with the combination of source (NCBI36/hg18) and target (GRCh37/hg19) assemblies selected.

ADD COMMENT
0
Entering edit mode

I have also tried this link. But when uploading the file to the Michigan Imputation Server, it gives the following error:

Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file

and my vcf does have the #CHROM header:

line 44: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 4023 (...)

line 45: 10 111955 rs7909677 A G . PASS PR;REMAP_ALIGN=FP GT 0/0 (...)

ADD REPLY
0
Entering edit mode

Did you paste the contents of the VCF in remap web interface or upload a file? I am wondering if it has something to do with using space vs tab as delimiters. An example file, if possible, will be helpful to diagnose the issue.

ADD REPLY
0
Entering edit mode

I uploaded the file. Can you please tell me how to upload the file here?

##fileformat=VCFv4.2                                    
##FILTER=<ID=PASS,Description="All filters passed">                                 
##fileDate=20220228                                 
##source=PLINKv1.90                                 
##contig=<ID=0,length=2147483645>                                   
##contig=<ID=1,length=247137335>                                    
##contig=<ID=2,length=242697434>                                    
##contig=<ID=3,length=199340831>                                    
##contig=<ID=4,length=191167889>                                    
##contig=<ID=5,length=180625440>                                    
##contig=<ID=6,length=170747903>                                    
##contig=<ID=7,length=158809727>                                    
##contig=<ID=8,length=146264219>                                    
##contig=<ID=9,length=140191297>                                    
##contig=<ID=10,length=135237858>                                   
##contig=<ID=11,length=134449983>                                   
##contig=<ID=12,length=132209175>                                   
##contig=<ID=13,length=114125099>                                   
##contig=<ID=14,length=106356483>                                   
##contig=<ID=15,length=100217561>                                   
##contig=<ID=16,length=88690777>                                    
##contig=<ID=17,length=78643089>                                    
##contig=<ID=18,length=76116030>                                    
##contig=<ID=19,length=63786939>                                    
##contig=<ID=20,length=62382908>                                    
##contig=<ID=21,length=46909249>                                    
##contig=<ID=22,length=49565873>                                    
##contig=<ID=23,length=154578240>                                   
##contig=<ID=24,length=27167582>                                    
##contig=<ID=25,length=154881767>                                   
##contig=<ID=26,length=15609>                                   
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">                                 
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">                                    
##bcftools_filterVersion=1.10.2+htslib-1.10.2                                   
##bcftools_filterCommand=filter -r 10 EPI.vcf.gz; Date=Mon Feb 28 12:17:19 2022                                 
##INFO=<ID=REMAP_ALIGN,Number=1,Type=String,Description="Alignment type used for remapping (FP=first pass, SP=second pass)">                                    
##INFO=<ID=REF_EDIT,Number=0,Type=Flag,Description="REF base modified during remapping due to either left shifting or difference in REF base between source and target assemblies.">                                    
##NCBI_remap_source_assm="GCF_000001405.12"                                 
##NCBI_remap_target_assm="GCF_000001405.13"                                 
##NCBI_remap_align_date="2014-09-23 20:19:00"                                   
##NCBI_remap_run_date="2022-02-28T08:24:12"                                 
##NCBI_remap_batch_id="86373"                                   
##NCBI_remap_align_parameters=<minratio=0.5,maxratio=2,multiloc=Y,mergefrag=N>                                  
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  4023
10  111955  rs7909677   A   G   .   PASS    PR;REMAP_ALIGN=FP   GT  0/0

Is it ok like this, or should I upload the file another way?

ADD REPLY
0
Entering edit mode

You can't upload files here but you could post a test section on pastebin.com or upload the raw data file to github.com and then paste a link for that page here.

ADD REPLY
0
Entering edit mode

While doing that, I noticed that at the end of the file, I had a few lines with other chromosomes (eg, HSCHRUN_RANDOM_CTG15). After removing them, the imputation worked.

Again, my original file with only chr11 had 33.421 lines and my remapped file has 12.055. Is it expectable to lose that many variants?

ADD REPLY

Login before adding your answer.

Traffic: 2251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6