I have some large (~200kb) contigs which were produced by Illumina sequencing. I want to map these against a human reference genome to identify the genes present on the contigs so their exon sequences can be analysed.
I have exported the region of interest from ensemble into a .fasta and .gff, produced alignment using BWA and tried to view the results in IGV.
The alignment of the reference to the contigs is behaving as expected, but I cannot get the gene information in the .gff file to load in IGV. I have tried renaming all the gff entires IDs to "test" and calling the .fasta header ">test", as well as other combinations but nothing seems to work.
Does anyone know what the issue could be? From what I have seen the .fasta header must match the gff first column exactly, but I have made sure this is the case!
Alternatviely, suggestions for other ways to visualise this information would be useful!
Any help is much appreciated :)
##gff-version 3 ##sequence-region 14 1 107349540 test Ensembl gene 106303099 106312010 . - . ID=ENSG00000211898;Name=ENSG00000211898;biotype=IG_C_gene test Ensembl gene 106320349 106322323 . - . ID=ENSG00000211899;Name=ENSG00000211899;biotype=IG_C_gene test Ensembl gene 106329408 106329468 . - . ID=ENSG00000211900;Name=ENSG00000211900;biotype=IG_J_gene test Ensembl gene 106329626 106329675 . - . ID=ENSG00000237111;Name=ENSG00000237111;biotype=IG_J_pseudogene test Ensembl gene 106330024 106330072 . - . ID=ENSG00000242472;Name=ENSG00000242472;biotype=IG_J_gene test Ensembl gene 106330425 106330470 . - . ID=ENSG00000240041;Name=ENSG00000240041;biotype=IG_J_gene
[EDIT]I think I have found the answer; when you export from ensemble, the numbering in the gff file for the features uses the numbering from the whole chromosome, but IGV seems to count from the start of the fasta file from 1, which means the numbering is all out.
Is there a way to get IGV to respect numbering in the fasta header? I can't find information on that from their manual. Otherwise I'll have to wite something quickly to subject the starting number from each entry in the gff.