Finding Common Annotations In Gtf And Gff Files Of Different Origin
1
1
Entering edit mode
11.2 years ago

I have lots RNASeq reads from numerous tissues from a bird species I'm working on. Unfortunately there are no well-annotated genomes for this bird. To get around this, we have assembled the RNA Seq reads and used blat to align them to the fasta-formatted genome (again, not well-annotated). I then converted the resulting psl files to gff files using blat2gff.

I've also taken the RNASeq reads and mapped them using Tophat with no gtf file, and generated a number of gtf files (one for each sample I have).

So - to summarize - I have gtf files from a Tophat run and GFF files from a blat alignment of assembled RNASeq reads.

My question is: How do I figure out which transcript generated by Tophat/Cufflinks in the GTF file correlates to which Transcript detailed in the GFF file?

What would be a good first step?

Thanks in advance!

Wyatt

gtf gff rna-seq • 3.0k views
ADD COMMENT
0
Entering edit mode

Thanks for the help, Istvan! I've tried IRanges previously and - while I did click the button and get an answer - I'm not sure what happened.

Bedtools seems to be easy to use, so I'll look into that and report back what I find!

Thanks,

Wyatt

ADD REPLY
2
Entering edit mode
11.2 years ago

The problem that you describe appears to be one of finding overlapping intervals contained different files.

There are several approaches to do so and tools such as bedtools and bedops, IRanges (Bioconductor) have been developed for this exact purpose.

ADD COMMENT

Login before adding your answer.

Traffic: 3846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6