Need help using ExpansionHunter
1
0
Entering edit mode
15 days ago

I would like to use Expansion Hunter with the data from a 30X genome (Nebula genomics) to look for repeat expansions.

I'm not sure what files to use and how to make this command line work.

ExpansionHunter --reads sample.bam --reference hg38.fasta --variant-catalog repeat_spec.json --output expansionhunter_output.json

I thought I would just unzip the Cram file and use that as the Bam file but the Warp terminal I used had to correct something and I'm not sure that it was the correct file or name.

I have the Cram, Crai,VCF,TBI files.

I also previously got the Crai,Cram, FastQ(FastQp1,FastQp2) ,VCF and TBI files.

Which file is used for the Bam file and how is it named?

Is the reference file needed just a general file or specific to an individual genome?

What files should I use and how should they be named?

Where do I put the files in ExpansionHunter so it finds them?

The ExpansionHunter folder has the folders Bin,Example, variant_catalog and their subfolders.

Do I put my files in the main ExpansionHunter folder or the smaller folders?

I have a specific gene in mind but I would like to use the Variant catalog with all 30 genes.

How can I make this run and get an output folder from the genome data?

Thanks. I appreciate any help you can give me.

ExpansionHunter • 937 views
ADD COMMENT
1
Entering edit mode
15 days ago

Which file is used for the Bam file and how is it named?

the .cram file. https://github.com/Illumina/ExpansionHunter/blob/master/docs/03_Usage.md "Expansion Hunter requires the following inputs: The BAM or CRAM file"

Where do I put the files in ExpansionHunter so it finds them? Do I put my files in the main ExpansionHunter folder or the smaller folders?

wherever you want. Just provide an absolute or a relative path https://www.linuxfoundation.org/blog/blog/classic-sysadmin-absolute-path-vs-relative-path-in-linux-unix

ADD COMMENT
0
Entering edit mode

Thanks for responding.

I am going to try it again soon.

I am using a Mac but I suppose I can still use those path descriptions in the command line.

I am using the Warp terminal. Is that right for this? or should I use the regular terminal?

reference hg38.fasta

I don't see the hg38.fasta file in the ExpansionHunter folder

Do I have to provide that from my data?

ADD REPLY
1
Entering edit mode

I am using the Warp terminal. Is that right for this? or should I use the regular terminal?

no idea. it shouldn't be a problem as long as it's a standard linux terminal

Do I have to provide that from my data?

yes. Furthermore, as you're using CRAM, it should be the fasta associated to the CRAM file.

ADD REPLY
0
Entering edit mode

Thanks for helping me.

I did get the Warp terminal ExpansionHunter command line to run and produce an output file.

But I believe the reference file I used was wrong and may invalidate the results.

I just used a general file online i thought was used as a reference file for the Nebula testing.

When I saw this posted on a similar thread I thought that was a general file that is used but i guess not.

https://igv-genepattern-org.s3.amazonaws.com/genomes/seq/hg38/hg38.fa

I have my original files Crai,Cram, FastQ(FastQp1,FastQp2) ,VCF and TBI files.

Should I use my Crai or FastQ files as a reference file? Can I produce a reference file from those?

Thanks

ADD REPLY
1
Entering edit mode

Should I use my Crai or FastQ files as a reference file? Can I produce a reference file from those?

No you can't do that. You will have to find the reference file that was used.

You mentioned something about nebula. So if your file is from there then see the answers here for potential places to get the reference --> Noob question. Samtools says my CRAM from Nebula is wonky, what can I do?

ADD REPLY
0
Entering edit mode

Thanks,

I may have found the reference file. There seems to be some question on the thread if its the exact one. It might be the one I used.

So the reference file would have to be specific to that run of Nebula tests but not especially specific to my data?

Would that make the ExpansionHunter results accurate?

ADD REPLY
1
Entering edit mode

So the reference file would have to be specific to that run of Nebula tests but not especially specific to my data?

It would be specific for your data. Consider the following characteristic (from https://www.htslib.org/workflow/cram.html )

CRAM is primarily a reference-based compressed format, meaning that only differences between the stored sequences and the reference are stored.

ADD REPLY
0
Entering edit mode

Thanks for the information.

I will read through this more and try to understand more.

Going back to ExpansionHunter, this is what I thought I was trying to do.

ExpansionHunter --reads <aligned reads BAM/CRAM file/URL> \ --reference <reference genome FASTA file> \ --variant-catalog <JSON file specifying variants to genotype> \ --output-prefix <Prefix for the output files>

Of course I'm using my Cram file from Nebula.

I thought I could use a file like this for the Reference file because it was said to go with with Nebula testing.

https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.26/

I suppose that is not specific enough for an accurate run?

I would have to contact Nebula for the reference Fasta file for my specific data?

Or can those tools extract it?

ADD REPLY
1
Entering edit mode

Try to convert the CRAM to BAM using samtools. It should download the correct reference, if it can find it, from EBI. Otherwise you will need to ask Nebula.

ADD REPLY
0
Entering edit mode

Thanks,

I will work on that more soon.

Someone said that you could simply count CAG repeats on the genome browser of Nebula from certain locations.

Does that method work? Are there limitations with the short reads data?

ADD REPLY

Login before adding your answer.

Traffic: 2122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6