Loading Custom Genome In Igv Is Not Displaying Genes, Names Match.
0
2
Entering edit mode
10.8 years ago
Nickengland ▴ 130

I have some large (~200kb) contigs which were produced by Illumina sequencing. I want to map these against a human reference genome to identify the genes present on the contigs so their exon sequences can be analysed.

I have exported the region of interest from ensemble into a .fasta and .gff, produced alignment using BWA and tried to view the results in IGV.

The alignment of the reference to the contigs is behaving as expected, but I cannot get the gene information in the .gff file to load in IGV. I have tried renaming all the gff entires IDs to "test" and calling the .fasta header ">test", as well as other combinations but nothing seems to work.

Does anyone know what the issue could be? From what I have seen the .fasta header must match the gff first column exactly, but I have made sure this is the case!

Alternatviely, suggestions for other ways to visualise this information would be useful!

Any help is much appreciated :)

##gff-version   3
##sequence-region   14  1   107349540
test    Ensembl gene    106303099   106312010   .   -   .   ID=ENSG00000211898;Name=ENSG00000211898;biotype=IG_C_gene
test    Ensembl gene    106320349   106322323   .   -   .   ID=ENSG00000211899;Name=ENSG00000211899;biotype=IG_C_gene
test    Ensembl gene    106329408   106329468   .   -   .   ID=ENSG00000211900;Name=ENSG00000211900;biotype=IG_J_gene
test    Ensembl gene    106329626   106329675   .   -   .   ID=ENSG00000237111;Name=ENSG00000237111;biotype=IG_J_pseudogene
test    Ensembl gene    106330024   106330072   .   -   .   ID=ENSG00000242472;Name=ENSG00000242472;biotype=IG_J_gene
test    Ensembl gene    106330425   106330470   .   -   .   ID=ENSG00000240041;Name=ENSG00000240041;biotype=IG_J_gene


[EDIT]I think I have found the answer; when you export from ensemble, the numbering in the gff file for the features uses the numbering from the whole chromosome, but IGV seems to count from the start of the fasta file from 1, which means the numbering is all out.

Is there a way to get IGV to respect numbering in the fasta header? I can't find information on that from their manual. Otherwise I'll have to wite something quickly to subject the starting number from each entry in the gff.

igv gff reference contigs • 9.6k views
0
Entering edit mode

can you post a few lines of your gff file?

0
Entering edit mode

I've added the first few lines, the .fasta file starts ">test"\n

0
Entering edit mode

I don't see anything wrong with the lines you posted. Try loading just one line of your file and make the start/end span a large distance just so you can obviously visualize it. See if that works.

0
Entering edit mode

Very strange, if I just load the top line, with a huge start/end span, it loads perfectly.

I suppose a binary search of the file is in order.

0
Entering edit mode

I don't understand what you mean. Are you saying the reference sequence fasta file you loaded in is in pieces? So each chromosome is broken up into multiple fasta entries?

0
Entering edit mode

I am not familiar with IGV, but the GFF3 doesn't make much sense. Is there a reference sequence named "test" defined somewhere? The only reference sequence in your snippet is (presumably) for chromosome 14, so I would expect "14" to be in the first column.

0
Entering edit mode

there is no binary search for the gff files, it will be loaded as a whole. What often happens is that the the seqid columns do not match, therefore the GFF features cannot be shown. From what I see you have renamed the seqid column to test, that does not seem to be right.

0
Entering edit mode

as the previous poster points it out, check the pragmas (lines with the ##). Remove the second pragma and check that way. Then add it back but make sure it matches. In fact I am not quite sure what the purpose of this second pragma is.