Searching for mRNA ending with a specific 3' pattern in NON-poly-A RNASeq data.
2
3
Entering edit mode
11 days ago

Hi all,

I'm looking for human non-poly-A mRNA that would end with a specific pattern ( say CCGCAT ).

is it possible to find this in a RNA-SEQ data ? (e.g: https://www.ncbi.nlm.nih.gov//sra?term=SRR059132 ) ? elsewhere ?

My idea would be to map the RNASeq data, use stringtie and then convert the GTF to fasta.

Is there a better / faster way ?

UPDATE: is it possible to know if a given transcript is poly-A+ or poly-A- from RNASeq data ?

rnaseq poly-a • 554 views
1
Entering edit mode

Why not look for that pattern in Ensembl transcriptome? Or do you need to find it from sequence data?

0
Entering edit mode

that's a good suggestion, we already looked at ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh37.73.ncrna.fa.gz in 2013. There were many mRNAs with the pattern but at the end my colleague checked and they were all errors :-( (I think he checked manually/wetlab)

1
Entering edit mode

GENCODE one is most updated. You also have the option of looking in MANE set of data that has one representative for each gene.

0
Entering edit mode

thanks but it looks like those are poly-A+ data isn't it ?

0
Entering edit mode

oh, there is gencode.v38.lncRNA_transcripts.fa.gz in the directory.

1
Entering edit mode

In addition RNACentral has a whole bunch of other non-coding RNA's. Look in the by-database directories or parse out human ones from big file.

0
Entering edit mode

You can't assume that lncRNAs are non-polyA. Plenty of non-coding RNAs have polyA tails.

1
Entering edit mode

For poly(A)+ transcripts you'd find an A-rich hexamer (the polyadenylation signal) ~10bp in front of the poly(A) tail see here.

But I don't know if poly(A)- mRNA miss this hexamer.

2
Entering edit mode
9 days ago

I think the short answer is no, its not possible to distinugish poly-A(+) and poly-A(-) transcripts from normal total RNA-seq. Nor can you rely on things like lncRNAs to be non-polyA.

You could try a range of different lines of evidence to converge on a set of things you think are probably non-polyA.

You could start with the matched poly-A(+) and poly-A(-) data sets from ENCODE. If you do transcript specific quantification over something like GENOCDE or RNAcentral, and look for things with differential abundance between the + and - datasets.

You could then cross reference with polyA-seq data that specifically identifes polyA cleavage sites genome wide, and look for transcripst that don't have any signal, yet are highly expressed in the cell type in question.

Finally you could filter to only transcripts that didn't have a poly-A signal.

2
Entering edit mode
6 days ago

ok, in the end I wrote a tool to find poly-A in RNASeq data. It gives me a suspicion about the poly-A-minus/plus state of a transcript.