I am wondering if there is a tool available to identify eukaryotic transcription start site using regular RNA-seq data? I know people use cap-seq to select 5' cap of mRNA and sequence them. However, there are a lot more RNA-seq data available. It would be interesting if we can try using RNA-seq data to identify transcription start site. Thanks!
I've spent a lot of time working on this problem but in the subset of TSS in Transposable Elements. You can do the trivial analysis such as assembly + look at the 5' end of Exon 1 but I was interested in a lot of the areas in which the assemblies didn't work so well (TSSs in repeats). In general RNA-seq + assembly doesn't do so hot for finding true 5' ends.
I put together a little bit of software for this called LIONS. It does what I want pretty well but to be honest, I think a similar application of ANNs to all transcriptome data can vastly improve TSS calling, I just never had the need to generalize this solution for all TSSs. Might be a starting place for you if you're looking to go after this problem in a serious capacity.