Entering edit mode
11.1 years ago
Bioinfosm ▴ 620
Am looking to use multiple human gene models for tophat, like ensembl, refseq and custom gtf files. Can one simply concatenate all these gtf (from the same genome build) and use the final file to guide tophat for rna-seq data?
These would obviously contain exact duplicates and overlaps. One can remove the exact duplicates if that helps tophat performance and efficiency, but overlaps become tricky...
thanks in advance..
THanks Wen! But I think cuffmerge is at the latter end of analysis to merge the cufflinks results and annotate. What am looking for is the front end merging of different annotation sources like refseq, ensembl, etc. so I can have all the possible models to guide tophat analysis. If its not necessary to merge and no loss of efficiency, I can simply use a concatenate of all those annotations
cuffmerge DOES give you a union of all possible models. it takes ANY gtf files. what is better, it filters out redundant ones. but if you don't care about duplicates, then yes concatenation is the simplest way.