Question: Loading Custom Genome In Igv Is Not Displaying Genes, Names Match.
gravatar for Nickengland
8.4 years ago by
Nickengland130 wrote:

I have some large (~200kb) contigs which were produced by Illumina sequencing. I want to map these against a human reference genome to identify the genes present on the contigs so their exon sequences can be analysed.

I have exported the region of interest from ensemble into a .fasta and .gff, produced alignment using BWA and tried to view the results in IGV.

The alignment of the reference to the contigs is behaving as expected, but I cannot get the gene information in the .gff file to load in IGV. I have tried renaming all the gff entires IDs to "test" and calling the .fasta header ">test", as well as other combinations but nothing seems to work.

Does anyone know what the issue could be? From what I have seen the .fasta header must match the gff first column exactly, but I have made sure this is the case!

Alternatviely, suggestions for other ways to visualise this information would be useful!

Any help is much appreciated :)

##gff-version   3
##sequence-region   14  1   107349540
test    Ensembl gene    106303099   106312010   .   -   .   ID=ENSG00000211898;Name=ENSG00000211898;biotype=IG_C_gene
test    Ensembl gene    106320349   106322323   .   -   .   ID=ENSG00000211899;Name=ENSG00000211899;biotype=IG_C_gene
test    Ensembl gene    106329408   106329468   .   -   .   ID=ENSG00000211900;Name=ENSG00000211900;biotype=IG_J_gene
test    Ensembl gene    106329626   106329675   .   -   .   ID=ENSG00000237111;Name=ENSG00000237111;biotype=IG_J_pseudogene
test    Ensembl gene    106330024   106330072   .   -   .   ID=ENSG00000242472;Name=ENSG00000242472;biotype=IG_J_gene
test    Ensembl gene    106330425   106330470   .   -   .   ID=ENSG00000240041;Name=ENSG00000240041;biotype=IG_J_gene

[EDIT]I think I have found the answer; when you export from ensemble, the numbering in the gff file for the features uses the numbering from the whole chromosome, but IGV seems to count from the start of the fasta file from 1, which means the numbering is all out.

Is there a way to get IGV to respect numbering in the fasta header? I can't find information on that from their manual. Otherwise I'll have to wite something quickly to subject the starting number from each entry in the gff.

reference gff contigs igv • 7.2k views
ADD COMMENTlink modified 8.4 years ago • written 8.4 years ago by Nickengland130

can you post a few lines of your gff file?

ADD REPLYlink written 8.4 years ago by Damian Kao15k

I've added the first few lines, the .fasta file starts ">test"\n

ADD REPLYlink written 8.4 years ago by Nickengland130

I don't see anything wrong with the lines you posted. Try loading just one line of your file and make the start/end span a large distance just so you can obviously visualize it. See if that works.

ADD REPLYlink written 8.4 years ago by Damian Kao15k

Very strange, if I just load the top line, with a huge start/end span, it loads perfectly.

I suppose a binary search of the file is in order.

ADD REPLYlink written 8.4 years ago by Nickengland130

I don't understand what you mean. Are you saying the reference sequence fasta file you loaded in is in pieces? So each chromosome is broken up into multiple fasta entries?

ADD REPLYlink written 8.4 years ago by Damian Kao15k

I am not familiar with IGV, but the GFF3 doesn't make much sense. Is there a reference sequence named "test" defined somewhere? The only reference sequence in your snippet is (presumably) for chromosome 14, so I would expect "14" to be in the first column.

ADD REPLYlink written 8.4 years ago by Scott Cain750

there is no binary search for the gff files, it will be loaded as a whole. What often happens is that the the seqid columns do not match, therefore the GFF features cannot be shown. From what I see you have renamed the seqid column to test, that does not seem to be right.

ADD REPLYlink written 8.4 years ago by Istvan Albert ♦♦ 84k

as the previous poster points it out, check the pragmas (lines with the ##). Remove the second pragma and check that way. Then add it back but make sure it matches. In fact I am not quite sure what the purpose of this second pragma is.

ADD REPLYlink written 8.4 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1055 users visited in the last hour