Question: VEP annotation doubt with NM (NM_000000.1_dup19)
2
gravatar for Cristian.perez
23 months ago by
Valencia
Cristian.perez50 wrote:

Hi community!,

I'm annotating variants with the VEP software and I'm finding some unexpected transcript data of the type:

  • NM_014938.4_dupl16
  • NM_001170637.2_dupl3
    1       206516261       .       C       T       47      PASS CSQ=T|non_coding_transcript_exon_variant|MODIFIER|SRGAP2|23380|Transcript|NM_001170637.2_dupl3|mRNA|1/20||NM_001170637.2_dupl3.1:n.65C>T||65|||||||1||SNV|EntrezGene|||||||||||C|C||||||||||||||||||||||||||||||||||||||||||||||||||||||||||1:206516261-206516261|0.4996565||||,
    T|missense_variant|MODERATE|SRGAP2|23380|Transcript|NM_001170637.3|protein_coding|1/20||NM_001170637.3:c.65C>T|NP_001164108.1:p.Arg289Trp|864|865|289|R/W|Cgg/Tgg|||1||SNV|EntrezGene||||||NP_001164108.1|||||C|C|OK|||||||||||||||||||||||||||||||||||0.63580||||T|T||||||||||2|||||||1:206516261-206516261|0.4996565||||,
    T|missense_variant|MODERATE|SRGAP2|23380|Transcript|NM_001300952.1|protein_coding|1/18||NM_001300952.1:c.65C>T|NP_001287881.1:p.Arg289Trp|864|865|289|R/W|Cgg/Tgg|||1||SNV|EntrezGene||||||NP_001287881.1|||||C|C|OK|||||||||||||||||||||||||||||||||||0.63580||||T|T||||||||||2|||||||1:206516261-206516261|0.4996565||||,
    T|non_coding_transcript_exon_variant|MODIFIER|SRGAP2|23380|Transcript|NM_015326.3_dupl3|mRNA|1/20||NM_015326.3_dupl3.1:n.65C>T||65|||||||1||SNV|EntrezGene||YES|||||||||C|C||||||||||||||||||||||||||||||||||||||||||||||||||||||||||1:206516261-206516261|0.4996565||||,
    T|missense_variant|MODERATE|SRGAP2|23380|Transcript|NM_015326.4|protein_coding|1/20||NM_015326.4:c.65C>T|NP_056141.2:p.Arg289Trp|864|865|289|R/W|Cgg/Tgg|||1||SNV|EntrezGene||YES||||NP_056141.2|||||C|C|OK|||||||||||||||||||||||||||||||||||0.63580||||T|T||||||||||2|||||||1:206516261-206516261|0.4996565||||       GT:DP:VD:AD:AF:RD:ALD   0/1:9:3:6,3:0.3333:6,0:3,0

Searching on the VEP webpage or in the internet I can't find any reference to this kind of "dupl" suffix. Has anyone faced this? I don't know if they are alternatives of the transcript or explain why they are not transcripts on its own.

Thanks in advance!

Cristian.

Edit: Added example of variant with the vep annotation of dup (NM_015326.3_dupl3)

Edit2: Using VEP ensembl version 91.1 with cache v91

vep • 680 views
ADD COMMENTlink modified 23 months ago by Emily_Ensembl20k • written 23 months ago by Cristian.perez50

could you post the variants (VCF records) that cause this annotation?

ADD REPLYlink written 23 months ago by cpad011212k

Also, which column of your VEP output are you finding this notation in?

ADD REPLYlink written 23 months ago by Emily_Ensembl20k

Hi Emily, it's the parameter that references the transcript, the "Feature" column (I'm actually outputting in a VCF format).

ADD REPLYlink written 23 months ago by Cristian.perez50
1

Thanks, will try to trace.

ADD REPLYlink written 23 months ago by Emily_Ensembl20k

Are you using GRCh37?

ADD REPLYlink written 23 months ago by Emily_Ensembl20k

Yes, version 91 of GRCh37

ADD REPLYlink written 23 months ago by Cristian.perez50

I think NM_015326.3_dupl3.1 and other entries mentioned in OP are feature (transcript) names in that build. Variation reporter for NC_000001.10:g.206516261C>T for GRCh37.p13 (AR-105, dbSNP v 149): doesn' list coding variant at position 65, instead at 322 (NM_015326.4:c.322C>T) and has only one annotation instead of 2, which is mentioned above.

ADD REPLYlink modified 23 months ago • written 23 months ago by cpad011212k

I supposed that is something like that. What intrigues us is why name it like a "duplXX". We thought that they may be duplicates from another transcripts or reference transcripts with duplicate exons, but watching that "dupl16" was really strange.

ADD REPLYlink written 23 months ago by Cristian.perez50
4
gravatar for Emily_Ensembl
23 months ago by
Emily_Ensembl20k
EMBL-EBI
Emily_Ensembl20k wrote:

We are investigating these. It looks like some RefSeq transcripts (eg NM_001170637.3) have been duplicated in Ensembl's other_features database with a lower version number and this dupl suffix (eg NM_001170637.2_dupl3). This has been propagated across to the VEP cache, which is why you're seeing them. We don't currently know why, but we believe that you can just ignore them from your analyses for now.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Emily_Ensembl20k
1

We've uncovered the source. This occurs in our pipeline that import the RefSeq transcripts. If we find that there are two with the same ID (eg NM_001170637.3 and NM_001170637.2), the pipeline is adding this dupl suffix, instead of the sensible option of just deleting the older one. We're not sure why we've written the pipeline in this way, as it seems a bit silly, but we will fix it. In the meantime, as I said before, just ignore them.

I'm really sorry about this.

ADD REPLYlink written 23 months ago by Emily_Ensembl20k
1

Thanks Emily. Something is worrying me though... Shouldn't this be happening always that there's two transcripts with same ID but different version? Should I always get the last version of the transcript for each variant? I'm pointing this because I'm finding for the same variant two different versions of the same ID, (NM_001170637.3 and NM_001170637.2 [NOT a real example, but if you need it I can find one]).

I imagine that the "dupl" error is because of some release and not a general issue. Right now I'm doing a check in every variant to remove old versions of the refseq ID if there's a new one.

ADD REPLYlink written 23 months ago by Cristian.perez50
1

There's 36 of them in the up-to-date database, so it's not universal. For various historical and political reasons, we have have two different snapshots of the RefSeq database which we merge together, so it will only be things that have been updated between those two snapshots.

ADD REPLYlink written 23 months ago by Emily_Ensembl20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2281 users visited in the last hour