Question

Finding Common Annotations In Gtf And Gff Files Of Different Origin

1

Entering edit mode

11.2 years ago

white.mccannon ▴ 10

I have lots RNASeq reads from numerous tissues from a bird species I'm working on. Unfortunately there are no well-annotated genomes for this bird. To get around this, we have assembled the RNA Seq reads and used blat to align them to the fasta-formatted genome (again, not well-annotated). I then converted the resulting psl files to gff files using blat2gff.

I've also taken the RNASeq reads and mapped them using Tophat with no gtf file, and generated a number of gtf files (one for each sample I have).

So - to summarize - I have gtf files from a Tophat run and GFF files from a blat alignment of assembled RNASeq reads.

My question is: How do I figure out which transcript generated by Tophat/Cufflinks in the GTF file correlates to which Transcript detailed in the GFF file?

What would be a good first step?

Thanks in advance!

Wyatt

gtf gff rna-seq • 3.0k views

ADD COMMENT • link 11.2 years ago by white.mccannon ▴ 10

0

Entering edit mode

Thanks for the help, Istvan! I've tried IRanges previously and - while I did click the button and get an answer - I'm not sure what happened.

Bedtools seems to be easy to use, so I'll look into that and report back what I find!

Thanks,

Wyatt

ADD REPLY • link 11.2 years ago by white.mccannon ▴ 10

score 2 · Answer 1 · 2013-02-02

2

Entering edit mode

11.2 years ago

Istvan Albert 100k

The problem that you describe appears to be one of finding overlapping intervals contained different files.

There are several approaches to do so and tools such as bedtools and bedops, IRanges (Bioconductor) have been developed for this exact purpose.

ADD COMMENT • link 11.2 years ago by Istvan Albert 100k