Hello, I am trying to check an example of repeat expansion output from here https://42basepairs.com/browse/s3/ont-open-data/giab_2023.05/analysis/variant_calling/hg001_sup_all?file=hg001.wf_str.vcf.gz&preview=
and I am confused about the difference between RU and displayRU
the fields are explained like this
##INFO=<ID=RU,Number=1,Type=String,Description="Repeat_unit_in_the_reference_orientation">
##INFO=<ID=DisplayRU,Number=1,Type=String,Description="Display_repeat_unit_familiar_to_clinician">
for example this case
chrX 147912050 . G <STR31>,<STR29> . PASS SVTYPE=STR;END=147912110;REF=20;RL=60;RU=GGC;REPID=FMR1;VARID=FMR1;STR_STATUS=normal,normal;STR_NORMAL_MAX=55;STR_PATHOLOGIC_MIN=200;RankScore=1:10;HGNCId=3775;InheritanceMode=XR;DisplayRU=CGG;SourceDisplay=GeneReviews_Internet_2019-11-21;Source=GeneReviews;SourceId=NBK1384;Disease=FragileX GT:SO:CN:CI:AD_SP:AD_FL:AD_IR 1/2:SPANNING/SPANNING:31/29:30-32/28-29:35/9:0/0:0/0
if I check in gnomAD STR I see that gnomad has CGG has repeat unit which corresponds to the displayRU
field
however another case
chr12 50505001 . G <STR17>,<STR8> . PASS SVTYPE=STR;END=50505022;REF=7;RL=21;RU=GGC;REPID=DIP2B;VARID=DIP2B;STR_STATUS=normal,normal;STR_NORMAL_MAX=24;STR_PATHOLOGIC_MIN=270;RankScore=1:10;HGNCId=29284;InheritanceMode=AD;DisplayRU=CGG;SourceDisplay=GeneReviews_Internet_2019-11-07;Source=GeneReviews;SourceId=NBK535148;Disease=FRA12A GT:SO:CN:CI:AD_SP:AD_FL:AD_IR 1/2:SPANNING/SPANNING:17/8:16-17/8-8:20/28:0/0:0/0
the RU
field corresponds to the Repeat unit in gnomAD STR
so my question is why we have this difference in this file and is gnomAD always returning what in the file corresponds to RU? Another additional question, is gnomad STR dataset 0-based or 1-based? Thanks a lot for in advance for any help!