Question: Finding Common Annotations In Gtf And Gff Files Of Different Origin
7.5 years ago
white.mccannon wrote:

I have lots RNASeq reads from numerous tissues from a bird species I'm working on. Unfortunately there are no well-annotated genomes for this bird. To get around this, we have assembled the RNA Seq reads and used blat to align them to the fasta-formatted genome (again, not well-annotated). I then converted the resulting psl files to gff files using blat2gff.

I've also taken the RNASeq reads and mapped them using Tophat with no gtf file, and generated a number of gtf files (one for each sample I have).

So - to summarize - I have gtf files from a Tophat run and GFF files from a blat alignment of assembled RNASeq reads.

My question is: How do I figure out which transcript generated by Tophat/Cufflinks in the GTF file correlates to which Transcript detailed in the GFF file?

What would be a good first step?

Thanks in advance!


gtf gff rna-seq
written 7.5 years ago by white.mccannon

Thanks for the help, Istvan! I've tried IRanges previously and - while I did click the button and get an answer - I'm not sure what happened.

Bedtools seems to be easy to use, so I'll look into that and report back what I find!



written 7.5 years ago by white.mccannon
7.5 years ago
Istvan Albert
University Park, USA
Istvan Albert wrote:

The problem that you describe appears to be one of finding overlapping intervals contained different files.

There are several approaches to do so and tools such as bedtools and bedops, IRanges (Bioconductor) have been developed for this exact purpose.

written 7.5 years ago by Istvan Albert
