5.8 years ago by
Washington University, St Louis, USA
Expanding on Pierre's answer. I actually already had the genome in IGV loaded directly from a fasta file. What I was failing to get was the corresponding gene annotations from the genbank record. But, the IGV link which Pierre provides does explain that when you create a .genome file you can optionally supply a gff file for gene annotations. That was the part I was missing. So, I did the following:
1) Download the fasta file for custom genome of interest. In this case it was a specific genome assembly for HBV (accession: HE974372).
2) Download the corresponding genbank record for custom genome of interest.
3) Convert the genbank file to gff3 format as per instructions here: http://bcb.io/2009/02/22/exploring-bioperl-genbank-to-gff-mapping/
If needed, install bioperl:
sudo apt install bioperl
Then run the tool on gb file as follows:
bp_genbank2gff3 -out stdout HBV_D4_HE974372.gb > HBV_D4_HE974372.gff
4) Create an alias file so that sequence names in fasta (gi|399923469|emb|HE974372.1|) will be correctly mapped to sequence names in gff file (HE974372). I wasn't sure about the order so I just created both mappings in a file called 'HBV_D4_alias.tab' which looked like:
You could also probably just edit the fasta file to use the shortened sequence name.
5) Create the .genome file in IGV. From the menu, 'Genomes' -> 'Create .genome file'.
- Unique identifier = 'HBV_D4'
- Descriptive name = 'HBV genotype D4 complete genome, isolate Mart-B36'
- Fasta file (browse to HBV_D4_HE974372.fasta)
- Gene file (browse to HBV_D4_HE974372.gff)
- Alias file (browse to HBV_D4_alias.tab)
Then, hit 'Ok' and save as HBV_D4.genome. With my rnaseq bam file loaded I now see reads in the context of annotated genes for this custom reference genome. Nice!
NOTE: It seems that with the passage of time this procedure (at least in some cases) has gotten a lot simpler. I just repeated the exercise with Bovine papillomavirus 1 (NC_001522.1). I downloaded the fasta file as previously in step 1. I then was able to directly download a GFF3 file in step 2 (instead of GB file). This allowed me to skip step 3. It also so happened that the sequence names in my bam file, fasta file, and gff3 file were consistent so no alias file was need. Thus I skipped straight to step 5 and created a .genome file with just the unique identifier, descriptive name, fasta file, and gene file and saw the intended result. This should be the case if you used the same fasta record when creating your indexed reference for alignment. I suspect the move by NCBI to drop GIs in favor of just accessions may have helped here.