How to Diagnose Fragile X from Whole Genome Sequencing
3
0
Entering edit mode
15 months ago
Rafael ▴ 10

Question from an enthusiast / student:

Imagine that a patient has done Whole Genome Sequencing (30x deep) and I have access to the BAM and VCF files. I also have some notion browsing IGV or any other Genome Browser Explorer.

I ask: how would I start looking for a possible Fragile X diagnosis based on the scenario described above? Should I look for pathogenic variants in the FMR1 gene or for CGG repeats?

Grateful for the attention and patience of friends.

FMR1 FragileX • 1.9k views
ADD COMMENT
2
Entering edit mode
15 months ago
barslmn ★ 2.1k

FragileX is mainly caused by repeat expansions. You should try ExpansionHunter https://github.com/Illumina/ExpansionHunter

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion. It seems like a good starting point. Any other suggestion/guidance is welcome.

ADD REPLY
2
Entering edit mode
11 months ago
Sasha ▴ 830

Hello! Based on the information provided, you are on the right track for analyzing the FMR1 gene in the context of Fragile X syndrome. Fragile X is indeed mainly caused by repeat expansions, specifically, CGG repeats in the FMR1 gene. Using ExpansionHunter, as suggested, is a good starting point for analyzing repeat expansions in the provided BAM file. However, I would like to point out that simply counting the CGG repetitions in the extracted sequence might not be the most accurate approach, as it may not account for sequencing errors or other complexities. Instead, using a specialized tool like ExpansionHunter will provide a more reliable estimation of the repeat count. Here's a basic example of how to run ExpansionHunter with a sample BAM file:

  1. First, download and install ExpansionHunter following the instructions on their GitHub page: https://github.com/Illumina/ExpansionHunter
  2. Create a JSON file (e.g., repeat_spec.json) containing the information about the FMR1 gene and the repeat region you are interested in:
    {
    "FMR1_CGG": {
     "ReferenceRegion": "chrX:147911919-147951125",
     "VariantType": "Repeat",
     "TargetRegion": "chrX:147911919-147951125",
     "RepeatUnit": "CGG"
    }
    }
    
  3. Run ExpansionHunter with the sample BAM file and the created JSON file:
    ExpansionHunter --reads sample.bam --reference hg38.fasta --variant-catalog repeat_spec.json --output expansionhunter_output.json
    
  4. Review the output JSON file (expansionhunter_output.json) to find the estimated repeat count for the FMR1 gene. This approach should give you a more accurate estimation of the CGG repeat count in the FMR1 gene, which can then be used to determine if the patient has a premutation or a full mutation associated with Fragile X syndrome.

You can also check out the great examples here of the types of outputs you should expect: https://github.com/Illumina/ExpansionHunter/tree/master/example/output

I'm using my chatbot here (https://tinybio.cloud) to help generate this answer. You can download it from the website.

ADD COMMENT
0
Entering edit mode

Thanks for the feedback Sasha. I've been having some difficulties dealing with ExpansionHunter, anyway, I'll keep trying.

ADD REPLY
1
Entering edit mode

What difficulties have you been facing? What have you tried, and what are the errors you are getting? If you describe your problem in detail and give more context you will get better answers.

ADD REPLY
0
Entering edit mode

What I've tried so far is this: I have the CRAM file and the respective CRAI (index).

My reference file is this:

https://igv-genepattern-org.s3.amazonaws.com/genomes/seq/hg38/hg38.fa

My variants file looks like this:

[
   {
     "LocusId": "FMR1",
     "LocusStructure": "(CGG)*",
     "ReferenceRegion": "chrX:147912050-147912110",
     "VariantType": "Repeat"
   }
]

When running ExpansionHunter, it looks like everything went fine:

$ ExpansionHunter --reads NG1PSZ7BE9.mm2.sortdup.bqsr.cram --reference hg38.fa --variant-catalog variant_catalog.json --sex male --output-prefix saida
2023-05-18T09:04:26,[Starting ExpansionHunter v5.0.0]
2023-05-18T09:04:26,[Analyzing sample NG1PSZ7BE9.mm2.sortdup.bqsr]
2023-05-18T09:04:26,[Initializing reference hg38.fa]
2023-05-18T09:04:26,[Loading variant catalog from disk variant_catalog.json]
2023-05-18T09:04:26,[Running sample analysis in seeking mode]
2023-05-18T09:04:26,[Analyzing FMR1]
2023-05-18T09:04:26,[Writing output to disk]

The result in the vcf file looks like this:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NG1PSZ7BE9.mm2.sortdup.bqsr
chrX    147912050   .   G   <STR30> .   PASS    END=147912110;REF=20;RL=60;RU=CGG;VARID=FMR1;REPID=FMR1 GT:SO:REPCN:REPCI:ADSP:ADFL:ADIR:LC 1:SPANNING:30:30-30:7:9:0:13.864865

Am I on the right track or did I do it all wrong?

Thank you for the patience.

ADD REPLY
0
Entering edit mode

First thing that caught my eye is variant catalog have a long list of off target regions for FMR1. Make sure to add those to your variant catalog file and rerun the analysis.

We can infer the output using the documentation: https://github.com/Illumina/ExpansionHunter/blob/master/docs/06_OutputVcfFiles.md

The example has two alleles showing the repeat count in the allele column <STR2>,<STR349>. However, your sample is male, and we are looking at the chromosome X which is hemizygous in males. That's why we are seeing a single allele with <STR30>. Fragile X phenotype manifest after CGG repeats exceeds 200. Also, even though less likely, you can check is for SNVs since you have the sequencing data.

Results of repeat expansion should be confirmed with an ortholog method like MLPA.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, I ran it again including the list of OfftargetRegions.

The result was 30 repetitions again:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NG1PSZ7BE9.mm2.sortdup.bqsr
chrX 147912050 . G <STR30> . PASS END=147912110;REF=20;RL=60;RU=CGG;VARID=FMR1;REPID=FMR1 GT:SO:REPCN:REPCI:ADSP:ADFL:ADIR:LC 1:SPANNING:30:30-30:7 :9:0:13.864865

Can we infer from this that this male sample is within the healthy range of CGG repeats for Fragile X?

ADD REPLY
0
Entering edit mode

Yes, from this result I would infer your sample is a healthy male with the assumption there were no problems with any of the analysis steps. It would be best to confirm with another method.

ADD REPLY
0
Entering edit mode
11 months ago
Rafael ▴ 10

Following this post. Perhaps you can indicate if I'm on the right track, although I'm a complete amateur now.

We know that the FMR1 gene is located in: chrX:147,911,919-147,951,125

Your size is: 39,207 bases

So I exactly "snipped" the above sequence, which is exactly 39,207 characters from a 30x Whole-Genome Sequencing (WGS).

Next I counted all the CGG repetitions.

Caps only: 59 repetitions
Lower case only: 57 repetitions
Uppercase and lowercase combined: 119 repetitions

So, in the worst case scenario, I understand that the patient only has the FMR1 premutation, since he has less than 200 repeats.

Does it make sense to more experienced friends?

Thanks.

ADD COMMENT
1
Entering edit mode

To count CGG repeats by WGS directly, the sequencing reads must span the entire repetitive region into the flanking unique sequences. The maximum contiguous read length on the Illumina platform is 300 nucleotides (MiSeq). This technical constraint limits detection to a maximum trinucleotide repeat count of 100. Larger numbers of repeats would require long-read platforms (e.g., PacBio or Oxford Nanopore sequencing) for direct counting. Alternatively, you can use a frequency-based repeat count estimator (like ExpansionHunter) with Illumina data as others have mentioned.

ADD REPLY
1
Entering edit mode

So I exactly "snipped" the above sequence, which is exactly 39,207 characters from a 30x Whole-Genome Sequencing (WGS).

I am not sure what this means. And overall, this doesn't sound like a suitable way to detect FMR1 expansions. The expansion is in a specific part of the gene, so it does not make sense to count all CGG motifs in the entire gene. In addition, expanded reads will not (or not properly) align, and you would have to count CGGs in softclips or unaligned mates.

ADD REPLY
0
Entering edit mode

Thanks for the feedback WouterDeCoster. What strategy do you suggest? Have you used ExpansionHunter successfully for this purpose or do you suggest another method? I take the opportunity to ask. At what exact FMR1 interval should I count CGG repetitions? Thanks

ADD REPLY
1
Entering edit mode

The expansion underlying Fragile X disease in FMR1 is at chrX:147912050-147912110. I don't work much on short read sequencing data, but ExpansionHunter would be the best suited tool for this.

ADD REPLY

Login before adding your answer.

Traffic: 2636 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6