Question: NCBI Genome Remapping Service- clinical remap
gravatar for Sudhir Jadhao
2.1 years ago by
Sudhir Jadhao70 wrote:

Dear All,

I am converting LRG coding sequence co-ordinate to hg19 genomic.

In "NCBI Genome Remapping Service- clinical remap " gives two types of output (given in this image).



Output1. Download full mapping report

CHROM                 START            END
NC_000009.11      136131070      136131072  (basepairs=>CCT)

Output2.Download Annotation Data(vcf-format)

CHROM           POS        ID   REF          ALT
NC_000009.11          136131068    .       TGAG    T

My question is why the co-ordinate and base pairs are different for the same input.

snp co-ordinate remapping • 848 views
ADD COMMENTlink modified 2.1 years ago by RamRS28k • written 2.1 years ago by Sudhir Jadhao70

Hi Sudhir,

I have adjusted the image for you using formatting options. You need to elaborate more on the question and you need to add appropriate tags. Tags help you to get quick attention to fellow members.

Thanks Vijay

ADD REPLYlink written 2.1 years ago by lakhujanivijay5.1k

Thank you, Vijay, for suggestions.

I have updated the question

ADD REPLYlink written 2.1 years ago by Sudhir Jadhao70

Hi There,

The content in the attached image is unreadable. Could you either update the image or paste the raw data here?

ADD REPLYlink written 2.1 years ago by Sej Modha4.7k

I have added raw data

ADD REPLYlink written 2.1 years ago by Sudhir Jadhao70

It looks like the NCBI record for this accession has been updated to NC_000009.12, could you check if the discrepancies still exist in the updated version of this record.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Sej Modha4.7k

but both outputs are form same assembly version: NC_000009.11

ADD REPLYlink written 2.1 years ago by Sudhir Jadhao70

Hello Sudhir,

some comments and questions on this:

  • What is the aim of using this "Remapping Service"?
  • You said you want to convert LRG co-ordinates to hg19 co-ordinates. But "NM_020469.2" isn't a LRG Number, it is the RefSeq Accession Number. The corresponding LRG would be LRG_792.
  • ABO seems to be under heavy development. In hg19 this gene isn't protein coding! Why do you want to use hg19?
  • Have a look at the LRG site for comparison between the different assemblies. There are so many differences. I don't understand what NCBI is doing behind the scene with their remapping service. But it seems to me that this is for this gene very difficult.

All these comments and questions bring us back to the first point: What is your goal? Maybe we can find a better way.

fin swimmer

ADD REPLYlink written 2.1 years ago by finswimmer13k

Thank you fin for your replay!!

I have population genomics data in VCF file (hg19). I want extract blood group variant from the vcf files

I hope it helps.

ADD REPLYlink written 2.1 years ago by Sudhir Jadhao70

So you have a XY-Problem?!

What have this task to do with "LRG-Transcripts", "Remapping", variants given in hgvs notation, ...?

I would suggestion you go back to the start and explain:

  • what data you have (show us an example!)
  • what the goal of the investigation of this data is
  • how your desired output should look like (show us an example!)

fin swimmer

ADD REPLYlink written 2.1 years ago by finswimmer13k

I have experimentally validated blood group SNP from ISBT database. These blood group variants are only present in cDNA RefSeq transcript form. To compare these variants with my hg19 VCF, I want to convert them to hg19.

ADD REPLYlink written 2.1 years ago by Sudhir Jadhao70

Ensembl's VEP can do this.



You can export the result as VCF:

##VEP="v92" time="2018-06-27 13:16:38" cache="/nfs/public/release/ensweb-data/latest/tools/grch37/e92/vep/cache/homo_sapiens/92_GRCh37" db="homo_sapiens_core_92_37@hh-mysql-ens-grch37-web" 1000genomes="phase3" COSMIC="81" ClinVar="201706" ESP="20141103" HGMD-PUBLIC="20164" assembly="GRCh37.p13" dbSNP="150" gencode="GENCODE 19" genebuild="2011-04" gnomAD="170228" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|TSL|APPRIS|SIFT|PolyPhen|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE">
9   136131068   NM_020469.2:c.1046_1048delAGG   CGCC    C   .   .   CSQ=-|non_coding_transcript_exon_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000453660|processed_transcript|7/7||||1058-1060|||||||-1||HGNC|79|||||||||||||||||||||||||||||,-|non_coding_transcript_exon_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000538324|processed_transcript|5/5||||919-921|||||||-1||HGNC|79|||||||||||||||||||||||||||||,-|upstream_gene_variant|MODIFIER|RP11-430N14.4|ENSG00000271875|Transcript|ENST00000606717|3prime_overlapping_ncrna|||||||||||119|-1||Clone_based_vega_gene||||||||||||||||||||||||||||||

fin swimmer


Contacted Emily_Ensembl to have a looked at this. I think the result should be

9   136131069   NM_020469.2:c.1046_1048delAGG   GCCT    G
ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by finswimmer13k

I take a closer look on it. And it's ... difficult.

In the Community Annotations on the LRG Site you'll find this comment:

There are known issues with the ABO gene in the reference genome assemblies (GRCh37 and GRCh38) (2).

And yes, the existing transcript's have a lot of changes compared to the reference genome. Not just SNV also Ins/Del. There is at least one position where the transcripts contain one more base in the coding region. Therefor the whole c. description changes.

Maybe I've found a little work around:

We use Ensembl's VEP for this again. But this time we use the version for hg38 and instead of NM_020469 we use ENST00000611156.4. One have to notice that these two transcript are not 100% identicaly. NM_020469 has one triplet more between c.255 and c.256. So for every variant that is behind this position we have two subtract 3 base.

Finaly for the given example our input would look like this:


Run it and export the output as vcf:

9   133255682   ENST00000611156.4:c.1043_1045delAGG GCCT    G   .   .   CSQ=-|non_coding_transcript_exon_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000453660|processed_transcript|7/7||||1075-1077|||||||-1||HGNC|HGNC:79|1||||||||||||||||||||||||||||,-|inframe_deletion|MODERATE|ABO|ENSG00000175164|Transcript|ENST00000538324|protein_coding|8/9||||1068-1070|1043-1045|348-349|QA/P|cAGGcg/ccg|||-1||HGNC|HGNC:79|5|P5|||||||||||||||||||||||||||,-|inframe_deletion|MODERATE|ABO|ENSG00000175164|Transcript|ENST00000611156|protein_coding|8/8||||1068-1070|1043-1045|348-349|QA/P|cAGGcg/ccg|||-1||HGNC|HGNC:79|5|A2|||||||||||||||||||||||||||,-|intron_variant&non_coding_transcript_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000647353|processed_transcript||1/1||||||||||-1||HGNC|HGNC:79|||||||||||||||||||||||||||||

With this result one can go to NCBI Remapping Service, take the Assembly-Assembly Tab, choose GRCh38.p11 :: Primary Assembly as Source and GRCh37.p13 :: Primary Assembly as target. Copy&Paste the vcf obtained from VEP.

The exported Annotation Data file now looks like this:

9   136131069   ENST00000611156.4:c.1043_1045delAGG GCCT    G   .   .   CSQ=-|non_coding_transcript_exon_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000453660|processed_transcript|7/7||||1075-1077|||||||-1||HGNC|HGNC:79|1||||||||||||||||||||||||||||,-|inframe_deletion|MODERATE|ABO|ENSG00000175164|Transcript|ENST00000538324|protein_coding|8/9||||1068-1070|1043-1045|348-349|QA/P|cAGGcg/ccg|||-1||HGNC|HGNC:79|5|P5|||||||||||||||||||||||||||,-|inframe_deletion|MODERATE|ABO|ENSG00000175164|Transcript|ENST00000611156|protein_coding|8/8||||1068-1070|1043-1045|348-349|QA/P|cAGGcg/ccg|||-1||HGNC|HGNC:79|5|A2|||||||||||||||||||||||||||,-|intron_variant&non_coding_transcript_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000647353|processed_transcript||1/1||||||||||-1||HGNC|HGNC:79|||||||||||||||||||||||||||||;REMAP_ALIGN=FP

That seems to be fine. If this all work's for your other variants? I don't know. Give it a try.

fin swimmer

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by finswimmer13k

@Sudhir: You should contact NCBI support with this example if this is happening.

ADD REPLYlink written 2.1 years ago by genomax87k

Thank you @fin and @Emily_Ensembl,

I have tried Ensembl's VEP before also but it is not converting all my input RefSeq to hg19, for only a few inputs it's working.

In will contact NCBI support, will see their reply

ADD REPLYlink written 2.1 years ago by Sudhir Jadhao70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1487 users visited in the last hour