IGV custom genome/Biostarhandbook Chapter 16
0
0
Entering edit mode
4.4 years ago
Ricky ▴ 50

Dear all,

being new to NGS data analysis and I am currently trying to work myself through the Biostarhandbook. Although it appears to be a rather simple task I am struggeling with generating a custom genome for ebola in IGV. I followed the instructions (code) given in chapter 16 of the handbook to generate the annotation (gff) file from the ebola genbank file. However, although IGV displays the genome sequence it does not show the annotation although the file is provided during the create .genome process. Also manually loading the gff file into the genome does not change anything. My fasta genome file looks like so:
cat 1976.fa | head >AF086833 CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA TTTTCCTCTCATTGAAATTTATATCGGAATTTAAATTGAAATTGTTACTGTAATCACACCTGGTTTGTTT CAGAGCCACATCACAAAGATAGAGAACAACCTAGGTCTCCGAAGGGAGCAAGGGCATCAGTGTGCTCAGT TGAAAATCCCTTGTCAACACCTAGGTCTTATCACATCACAAGTTCCACCTCAGACTCTGCAGGGTGATCC AACAACCTTAATAGAAACATTATTGTTAAAGGACAGCATTAGTTCACAGTCAAACAAGCAAGATTGAGAA TTAACCTTGGTTTTGAACTTGAACACTTAGGGGATTGAAGATTCAACAACCCTAAAGCTTGGGGTAAAAC ATTGGAAATAGTTAAAAGACAAATTGCTCGGAATCACAAAATTCCGAGTATGGATTCTCGTCCTCAGAAA ATCTGGATGGCGCCGAGTCTCACTGAATCTGACATGGATTACCACAAGATCTTGACAGCAGGTCTGTCCG TTCAACAGGGGATTGTTCGGCAAAGAGTCATCCCAGTGTATCAAGTAAACAATCTTGAAGAAATTTGCC  and my gff file like so: ##gff-version 3 AF086833 EMBL gene 56 3026 . + . ID=AF086833.3;gene=NP AF086833 EMBL gene 3032 4407 . + . ID=AF086833.9;gene=VP35 AF086833 EMBL gene 4390 5894 . + . ID=AF086833.13;gene=VP40 AF086833 EMBL gene 5900 8305 . + . ID=AF086833.20;gene=GP AF086833 EMBL gene 8288 9740 . + . ID=AF086833.34;gene=VP30 AF086833 EMBL gene 9885 11518 . + . ID=AF086833.41;gene=VP24;note=putative AF086833 EMBL gene 11501 18282 . + . ID=AF086833.47;gene=L  Could anybody point me towards a solution? Cheers Ricky IGV RNA-Seq genome • 2.7k views ADD COMMENT 0 Entering edit mode Did you try "Open File" or "Import Regions"? ADD REPLY 0 Entering edit mode According to the instructions I added the gff file in Create Genome/Optional/Gene file when I also set the fasta file for the genome. I also tried to add the gff file manually after opening the genome in IGV using "Open File" and according to your suggestion now used "Import Regions", no success though. ADD REPLY 0 Entering edit mode There is not much to see in the genome unless you load an alignment file in. Have you done that after creating the "custom" genome? Remember to zoom in significantly before you will start seeing features on the GTF track/read in the alignment window. ADD REPLY 0 Entering edit mode The genome is rather small (ca 19 kb) an relatively gene rich so annotation should be easily visible, zooming doesn't help at all. I can load a bam file and alignment of reads is displayed nicely, so that works, but still no genes. ADD REPLY 2 Entering edit mode It is possible that IGV has started checking fasta sequence identifiers strictly since these directions were written. There is a mismatch between the chromosome name in the 1976.fa file and the 1976-genes.gff file. Try the following: 1. In IGV select a different genome. Go into Genomes --> Manage Genome List. Delete the custom Ebola genome you made. 2. Open 1976.fa file in an editor and remove the version number from the accession number. Change >AF086833.2 to >AF086833. Save the file. If you have a previous 1976.fa.fai file in the directory where you saved the genome delete that so IGV will be forced to recreate the index. 3. Follow the directions to make a new genome using this edited sequence file and the GFF file. I just confirmed that this works. ADD REPLY 0 Entering edit mode Thanks a lot for the suggestion, changed the file to  cat 1976-genes.gff
##gff-version 3
AF086833    EMBL    gene    56  3026    .   +   .   ID=AF086833;gene=NP
AF086833    EMBL    gene    3032    4407    .   +   .   ID=AF086833;gene=VP35
AF086833    EMBL    gene    4390    5894    .   +   .   ID=AF086833;gene=VP40
AF086833    EMBL    gene    5900    8305    .   +   .   ID=AF086833;gene=GP
AF086833    EMBL    gene    8288    9740    .   +   .   ID=AF086833;gene=VP30
AF086833    EMBL    gene    9885    11518   .   +   .   ID=AF086833;gene=VP24;note=putative
AF086833    EMBL    gene    11501   18282   .   +   .   ID=AF086833;gene=L


Solved the problem somehow partially, now IGV displays at least the last entry of the list (gene L) in its correct position. Strangely enough the other ones are not shown, although I cannot see any difference in formatting now between the lines.

1
Entering edit mode

I asked you to edit the sequence file header (fasta) not the GFF file. Use the original GFF file as is. Can you try it again?

0
Entering edit mode

Sorry for that, was a bit confused. The header of my genome sequence fasta file is:

\$ cat 1976.fa | head
>AF086833
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA


so no version number in AF086833 and I changed back the gff file to:

##gff-version 3
AF086833    EMBL    gene    56  3026    .   +   .   ID=AF086833.3;gene=NP
AF086833    EMBL    gene    3032    4407    .   +   .   ID=AF086833.9;gene=VP35
AF086833    EMBL    gene    4390    5894    .   +   .   ID=AF086833.13;gene=VP40
AF086833    EMBL    gene    5900    8305    .   +   .   ID=AF086833.20;gene=GP
AF086833    EMBL    gene    8288    9740    .   +   .   ID=AF086833.34;gene=VP30
AF086833    EMBL    gene    9885    11518   .   +   .   ID=AF086833.41;gene=VP24;note=putative
AF086833    EMBL    gene    11501   18282   .   +   .   ID=AF086833.47;gene=L


Did the genome again in IGV and indeed, now it displays the genes. However, overlapping genes are represented as a single bar. Is there any possibility I can change that to overlapping bars arranged on top of each other?

I meanwhile found out: change visualization from collapsed to expanded

0
Entering edit mode

I think that is the way IGV handles annotation files. You could split the genes into multiple GFF files and then load them to see if that sort of does what you want.