Question: Novel transcripts from monoculture RNAseq applied to tissue-level RNAseq
gravatar for matt.a.bennett25890
13 months ago by
matt.a.bennett258900 wrote:

Hi all,

Bit of an essay apologies...

I've applied a novel transcript discovery pipeline to RNAseq data derived from a cell type grown as a monoculture with/without treatment in vitro, particularly focusing on lncRNAs:

Basic pipeline (open to comments/criticism!): STAR mapping of reads to GENCODEv26 indexed hg38 -> remove non-expressed transcripts from GENCODEv26 -> StringTie to merge abundance-filtered GENCODEv26 transcripts with new transcripts -> remove known ORFs -> CPC/HMMER/RNAcode -> annotated + new lncRNAs

This has yielded some nice data, seeing ~35% newly assembled lncRNAs in my differentially expressed genes. I also have basically a customised specific annotation for this cell type too.

Would now like to see relevance for some in vivo data, I have found tissue-level data which will contain a variable amount of the cell type I started with as well as others. My approach so far would be to RSEM the reads in this dataset to my new customised annotation. A bit messy, but I think enough to show my lncs are active in a real world situation though I'm also having doubts which any comments on below questions may aid!

1)Are there any approaches to estimate cellular make up in tissue-level data based on cell-specific markers?

2)Is this just too naive an approach to be useful?

3)Could run the pipeline again on the in vivo dataset but it isn't stranded... would this mess up transcript discovery too much?

Would appreciate any input, thanks for reading :)

ADD COMMENTlink modified 13 months ago by benformatics2.1k • written 13 months ago by matt.a.bennett258900
gravatar for benformatics
13 months ago by
ETH Zurich
benformatics2.1k wrote:

2) Something that wasn't clear but you should absolutely do (if you haven't already) is overlap your "novel" elements with elements in the current GENOCDE annotations... there is a release v33 available. You could also check RefSeq and Ensembl annotations. This will answer the question "Are these transcript actually novel?" Also why are you removing only known ORFs - shouldn't you be removing the whole transcript/cDNA including the 3' and 5' UTRs?

1) Estimating cellular makeup from bulk RNA-seq is difficult (but there are methods available using knowledge gained from scRNA-seq) and one of the major drivers behind the rise of single-cell sequencing.

3) Yes and no. If your novel transcripts are outside of known genes then it would work. However at all points you would need to treat your datasets as if all reads and discovered transcripts were potentially from the same strand (e.g. both + and -; essentially unknown or *)

ADD COMMENTlink written 13 months ago by benformatics2.1k

Thanks, some good food for thought in here.

Have overlapped my new assemblies to FANTOM CAT (seeing some degree of exonic overlap for mast majority) but would be good to do latest GENCODE too.

ADD REPLYlink written 13 months ago by matt.a.bennett258900
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2230 users visited in the last hour