Question: Finding Common Annotations In Gtf And Gff Files Of Different Origin
gravatar for white.mccannon
7.5 years ago by
white.mccannon10 wrote:

I have lots RNASeq reads from numerous tissues from a bird species I'm working on. Unfortunately there are no well-annotated genomes for this bird. To get around this, we have assembled the RNA Seq reads and used blat to align them to the fasta-formatted genome (again, not well-annotated). I then converted the resulting psl files to gff files using blat2gff.

I've also taken the RNASeq reads and mapped them using Tophat with no gtf file, and generated a number of gtf files (one for each sample I have).

So - to summarize - I have gtf files from a Tophat run and GFF files from a blat alignment of assembled RNASeq reads.

My question is: How do I figure out which transcript generated by Tophat/Cufflinks in the GTF file correlates to which Transcript detailed in the GFF file?

What would be a good first step?

Thanks in advance!


gtf gff rna-seq • 2.5k views
ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by white.mccannon10

Thanks for the help, Istvan! I've tried IRanges previously and - while I did click the button and get an answer - I'm not sure what happened.

Bedtools seems to be easy to use, so I'll look into that and report back what I find!



ADD REPLYlink written 7.5 years ago by white.mccannon10
gravatar for Istvan Albert
7.5 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

The problem that you describe appears to be one of finding overlapping intervals contained different files.

There are several approaches to do so and tools such as bedtools and bedops, IRanges (Bioconductor) have been developed for this exact purpose.

ADD COMMENTlink written 7.5 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour