I got 20 RNA-Seq samples from one tissue / different animals / same species. I mapped these to the genome one by one using tophat, then run cufflinks and cuffmerge.
My feeling is that some transcripts expressed at very low level are discarded by cufflinks, and subsequent merging can not rescue them. Hence my questions:
- can I do any better than this?
- apart from merging 20 individual BAM files into one giant one, what are the options?
- any experience with cufflinks alternatives which may be better from genome annotation point of view?
Thanks a lot for your help
re: multiple fastq solution: this depends how tophat does the mapping. If mapping reads in spliced mode depends dramatically on positions of already mapped reads, then sure, combining all FASTQ in one go is better than merging BAMs. On the other hand if 2x 96bp RNA-Seq mapping by tophat is not reliable without prior coverage of exons by unspliced reads / entries in GTF file then one should check other mappers.
re .gtf: Yes, I got ENSEMBL annotation which I used for mapping. Here is the relevant part:
--min-intron-length 21 --max-intron-length 200000 --segment-mismatches 1 --butterfly-search --GTF my_ensembl.gtf
re another approaches:
I also used our in house pipeline based on GEM for mapping. I will try Trinity, possibly also Trans-Abyss.