Closed:Gene-level counts from transcript Isoform counts - no reference genome. Hi-seq + Iso-seq
0
0
Entering edit mode
3.7 years ago
carmel25 • 0

Hi,

I am aiming to get gene expression data but have no reference genome. I have mapped Hi-Seq reads, onto representative Iso-seq sequencing from my samples as a sort of reference. I have put the Iso-seq data through Iso-seq3 -> Cogent -> ANGEL to get full-length, unique isoforms in open reading frame. I have then run Kallisto to map the Hi-seq reads onto those, Imported it into R using catchSalmon, and have been analysing isoform counts in edgeR.

How do I collapse these into actual gene counts?

Can I use the naming convention output from Cogent/ANGEL which is PB.[loci index#].[Isoform index#]| to find a way to sum them to loci in EdgeR using that info?

(I have read How to convert transcript level TPM to gene level TPM ? and other answers) Is using tximport (I assume with my Kallisto abundance.tsv) still the way to go? If so, how/can I get it to stop after the 2nd "." e.g. at PB.3 for the below, since my target IDs aren't ENS Identifiers, and the tximport ignoreAfterBar=TRUE option would still include my Isoform indexes as separate IDs in this format?

_

One of my Kallisto abundance.tsv files looks like this:

target_id length eff_length est_counts tpm

PB.2.1|002537|path0:1-1624(+)|transcript/18304|m.1 405 260.579 198 10.9076

PB.3.1|004815|path2:1-3039(+)|transcript/3426|m.4 2187 2042.14 1920.76 13.5017

PB.3.2|004815|path2:1047-3035(+)|transcript/13401|m.5 933 788.137 60.0475 1.09369

etc.

RNA-Seq kallisto expression • 281 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6