Question: NCBI Genome Remapping Service- clinical remap
0
gravatar for Sudhir Jadhao
15 months ago by
India
Sudhir Jadhao60 wrote:

Dear All,

I am converting LRG coding sequence co-ordinate to hg19 genomic.

In "NCBI Genome Remapping Service- clinical remap " gives two types of output (given in this image).

Input:

NM_020469.2:c.1046_1048delAGG

Output1. Download full mapping report

CHROM                 START            END
NC_000009.11      136131070      136131072  (basepairs=>CCT)

Output2.Download Annotation Data(vcf-format)

CHROM           POS        ID   REF          ALT
NC_000009.11          136131068    .       TGAG    T

My question is why the co-ordinate and base pairs are different for the same input.

snp co-ordinate remapping • 614 views
ADD COMMENTlink modified 15 months ago by RamRS24k • written 15 months ago by Sudhir Jadhao60

Hi Sudhir,

I have adjusted the image for you using formatting options. You need to elaborate more on the question and you need to add appropriate tags. Tags help you to get quick attention to fellow members.

Thanks Vijay

ADD REPLYlink written 15 months ago by lakhujanivijay4.5k

Thank you, Vijay, for suggestions.

I have updated the question

ADD REPLYlink written 15 months ago by Sudhir Jadhao60

Hi There,

The content in the attached image is unreadable. Could you either update the image or paste the raw data here?

ADD REPLYlink written 15 months ago by Sej Modha4.5k

I have added raw data

ADD REPLYlink written 15 months ago by Sudhir Jadhao60

It looks like the NCBI record for this accession has been updated to NC_000009.12, could you check if the discrepancies still exist in the updated version of this record.

ADD REPLYlink modified 15 months ago • written 15 months ago by Sej Modha4.5k

but both outputs are form same assembly version: NC_000009.11

ADD REPLYlink written 15 months ago by Sudhir Jadhao60

Hello Sudhir,

some comments and questions on this:

  • What is the aim of using this "Remapping Service"?
  • You said you want to convert LRG co-ordinates to hg19 co-ordinates. But "NM_020469.2" isn't a LRG Number, it is the RefSeq Accession Number. The corresponding LRG would be LRG_792.
  • ABO seems to be under heavy development. In hg19 this gene isn't protein coding! Why do you want to use hg19?
  • Have a look at the LRG site for comparison between the different assemblies. There are so many differences. I don't understand what NCBI is doing behind the scene with their remapping service. But it seems to me that this is for this gene very difficult.

All these comments and questions bring us back to the first point: What is your goal? Maybe we can find a better way.

fin swimmer

ADD REPLYlink written 15 months ago by finswimmer12k

Thank you fin for your replay!!

I have population genomics data in VCF file (hg19). I want extract blood group variant from the vcf files

I hope it helps.

ADD REPLYlink written 15 months ago by Sudhir Jadhao60

So you have a XY-Problem?!

What have this task to do with "LRG-Transcripts", "Remapping", variants given in hgvs notation, ...?

I would suggestion you go back to the start and explain:

  • what data you have (show us an example!)
  • what the goal of the investigation of this data is
  • how your desired output should look like (show us an example!)

fin swimmer

ADD REPLYlink written 15 months ago by finswimmer12k

I have experimentally validated blood group SNP from ISBT database. These blood group variants are only present in cDNA RefSeq transcript form. To compare these variants with my hg19 VCF, I want to convert them to hg19.

ADD REPLYlink written 15 months ago by Sudhir Jadhao60

Ensembl's VEP can do this.

Input:

NM_020469.2:c.1046_1048delAGG

You can export the result as VCF:

##fileformat=VCFv4.1
##VEP="v92" time="2018-06-27 13:16:38" cache="/nfs/public/release/ensweb-data/latest/tools/grch37/e92/vep/cache/homo_sapiens/92_GRCh37" db="homo_sapiens_core_92_37@hh-mysql-ens-grch37-web" 1000genomes="phase3" COSMIC="81" ClinVar="201706" ESP="20141103" HGMD-PUBLIC="20164" assembly="GRCh37.p13" dbSNP="150" gencode="GENCODE 19" genebuild="2011-04" gnomAD="170228" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|TSL|APPRIS|SIFT|PolyPhen|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
9   136131068   NM_020469.2:c.1046_1048delAGG   CGCC    C   .   .   CSQ=-|non_coding_transcript_exon_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000453660|processed_transcript|7/7||||1058-1060|||||||-1||HGNC|79|||||||||||||||||||||||||||||,-|non_coding_transcript_exon_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000538324|processed_transcript|5/5||||919-921|||||||-1||HGNC|79|||||||||||||||||||||||||||||,-|upstream_gene_variant|MODIFIER|RP11-430N14.4|ENSG00000271875|Transcript|ENST00000606717|3prime_overlapping_ncrna|||||||||||119|-1||Clone_based_vega_gene||||||||||||||||||||||||||||||

fin swimmer

EDIT:

Contacted Emily_Ensembl to have a looked at this. I think the result should be

9   136131069   NM_020469.2:c.1046_1048delAGG   GCCT    G
ADD REPLYlink modified 15 months ago • written 15 months ago by finswimmer12k

I take a closer look on it. And it's ... difficult.

In the Community Annotations on the LRG Site you'll find this comment:

There are known issues with the ABO gene in the reference genome assemblies (GRCh37 and GRCh38) (2).

And yes, the existing transcript's have a lot of changes compared to the reference genome. Not just SNV also Ins/Del. There is at least one position where the transcripts contain one more base in the coding region. Therefor the whole c. description changes.

Maybe I've found a little work around:

We use Ensembl's VEP for this again. But this time we use the version for hg38 and instead of NM_020469 we use ENST00000611156.4. One have to notice that these two transcript are not 100% identicaly. NM_020469 has one triplet more between c.255 and c.256. So for every variant that is behind this position we have two subtract 3 base.

Finaly for the given example our input would look like this:

ENST00000611156.4:c.1043_1045delAGG

Run it and export the output as vcf:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
9   133255682   ENST00000611156.4:c.1043_1045delAGG GCCT    G   .   .   CSQ=-|non_coding_transcript_exon_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000453660|processed_transcript|7/7||||1075-1077|||||||-1||HGNC|HGNC:79|1||||||||||||||||||||||||||||,-|inframe_deletion|MODERATE|ABO|ENSG00000175164|Transcript|ENST00000538324|protein_coding|8/9||||1068-1070|1043-1045|348-349|QA/P|cAGGcg/ccg|||-1||HGNC|HGNC:79|5|P5|||||||||||||||||||||||||||,-|inframe_deletion|MODERATE|ABO|ENSG00000175164|Transcript|ENST00000611156|protein_coding|8/8||||1068-1070|1043-1045|348-349|QA/P|cAGGcg/ccg|||-1||HGNC|HGNC:79|5|A2|||||||||||||||||||||||||||,-|intron_variant&non_coding_transcript_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000647353|processed_transcript||1/1||||||||||-1||HGNC|HGNC:79|||||||||||||||||||||||||||||

With this result one can go to NCBI Remapping Service, take the Assembly-Assembly Tab, choose GRCh38.p11 :: Primary Assembly as Source and GRCh37.p13 :: Primary Assembly as target. Copy&Paste the vcf obtained from VEP.

The exported Annotation Data file now looks like this:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
9   136131069   ENST00000611156.4:c.1043_1045delAGG GCCT    G   .   .   CSQ=-|non_coding_transcript_exon_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000453660|processed_transcript|7/7||||1075-1077|||||||-1||HGNC|HGNC:79|1||||||||||||||||||||||||||||,-|inframe_deletion|MODERATE|ABO|ENSG00000175164|Transcript|ENST00000538324|protein_coding|8/9||||1068-1070|1043-1045|348-349|QA/P|cAGGcg/ccg|||-1||HGNC|HGNC:79|5|P5|||||||||||||||||||||||||||,-|inframe_deletion|MODERATE|ABO|ENSG00000175164|Transcript|ENST00000611156|protein_coding|8/8||||1068-1070|1043-1045|348-349|QA/P|cAGGcg/ccg|||-1||HGNC|HGNC:79|5|A2|||||||||||||||||||||||||||,-|intron_variant&non_coding_transcript_variant|MODIFIER|ABO|ENSG00000175164|Transcript|ENST00000647353|processed_transcript||1/1||||||||||-1||HGNC|HGNC:79|||||||||||||||||||||||||||||;REMAP_ALIGN=FP

That seems to be fine. If this all work's for your other variants? I don't know. Give it a try.

fin swimmer

ADD REPLYlink modified 15 months ago • written 15 months ago by finswimmer12k

@Sudhir: You should contact NCBI support with this example if this is happening.

ADD REPLYlink written 15 months ago by genomax72k

Thank you @fin and @Emily_Ensembl,

I have tried Ensembl's VEP before also but it is not converting all my input RefSeq to hg19, for only a few inputs it's working.

In will contact NCBI support, will see their reply

ADD REPLYlink written 15 months ago by Sudhir Jadhao60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1797 users visited in the last hour