Visualize multiple GFF files
2
3
Entering edit mode
3.8 years ago
Anand Rao ▴ 470

For each of several genomes, apart from the already available fasta sequence and associated GFF3 annotation files, I have also generated 5 additional GFF files for start-stop coordinates of 5 additional types of genomic features.

My goal is four-fold in the context of these 6 GFF files and 1 genomic DNA sequence.

1. Explore where two or more of these genomic features overlap / intersect / co-localize - I am doing this via text manipulation, using bedtools overlap or bedtools intersect. But I want to visualize 2 or more tracks , and just text-based calculation of intersection is not satisfying during exploration phase. I want to visualize it.

2. Generate high-detail images (with flexibility of color / shapes like that in IGB or gff2ps) for small intervals, looking at specific and most interesting cases.

3. Generate overview images for entire chromosomes or even a genome, for overlap across these 6 types of genomic features , without confusing or overwhelming the reader.

4. Finally, to request advice on which tool and/or which statistical test to perform for verifying whether the observed physical co-localization of any 2 of the 6 types of genomic features is random or non-random. And if latter is true, are there more sophisticated tests to examine physical distribution of genomic loci types, relative to one another? And are there tests that can examine more than 2 types of genomic features at a time?

The rather old thread at What Tools/Libraries Do You Use To Visualize Genomic Feature Data? discusses answers to questions 1 - 3 above, but I am curious to know if there are better / updates tools for my goals than the ones I mentioned above or at the link (bedtools or bedOps, IGB, GBrowse, GFF2PS). Thanks!

GFF overlap statistics visualization • 2.5k views
5
Entering edit mode
3.4 years ago
bernatgel ★ 3.3k

Hi,

If you can use R you should be able to create these plots with karyoploteR. You would need to load the data into R (probably using rtracklayer's ' import' function) and then plot them using kpPlotGenesfor genes and kpPlotRegions for everything else. You can find more information and various examples on how to use it at karyoploteR titorial page.

As for point 4 you can use the Bioconductor package regioneR. If you load the data into R you can use the function overlapPermTest to perform a permutation strategy to test if two sets of genomic regions overlap more (or less) than expected by chance.

Hope this helps

Bernat

4
Entering edit mode
3.4 years ago
Joe 20k

You could do this (I think) with Artemis and/or the Artemis Comparison Tool.

Load up the sequence, and read in the different annotation files and you can view them all together in the same window I believe.

It probably doesn't solve all of your requests though, but worth a look I think.