Question

Closed:Gene-level counts from transcript Isoform counts - no reference genome. Hi-seq + Iso-seq

0

Entering edit mode

3.7 years ago

carmel25 • 0

Hi,

I am aiming to get gene expression data but have no reference genome. I have mapped Hi-Seq reads, onto representative Iso-seq sequencing from my samples as a sort of reference. I have put the Iso-seq data through Iso-seq3 -> Cogent -> ANGEL to get full-length, unique isoforms in open reading frame. I have then run Kallisto to map the Hi-seq reads onto those, Imported it into R using catchSalmon, and have been analysing isoform counts in edgeR.

How do I collapse these into actual gene counts?

Can I use the naming convention output from Cogent/ANGEL which is PB.[loci index#].[Isoform index#]| to find a way to sum them to loci in EdgeR using that info?

(I have read How to convert transcript level TPM to gene level TPM ? and other answers) Is using tximport (I assume with my Kallisto abundance.tsv) still the way to go? If so, how/can I get it to stop after the 2nd "." e.g. at PB.3 for the below, since my target IDs aren't ENS Identifiers, and the tximport ignoreAfterBar=TRUE option would still include my Isoform indexes as separate IDs in this format?

_

One of my Kallisto abundance.tsv files looks like this:

target_id length eff_length est_counts tpm

PB.2.1|002537|path0:1-1624(+)|transcript/18304|m.1 405 260.579 198 10.9076

PB.3.1|004815|path2:1-3039(+)|transcript/3426|m.4 2187 2042.14 1920.76 13.5017

PB.3.2|004815|path2:1047-3035(+)|transcript/13401|m.5 933 788.137 60.0475 1.09369

etc.

RNA-Seq kallisto expression • 281 views

ADD COMMENT • link 3.7 years ago by carmel25 • 0