Searching for mRNA ending with a specific 3' pattern in NON-poly-A RNASeq data.
2
3
Entering edit mode
2.6 years ago

Hi all,

asking for a colleague,

I'm looking for human non-poly-A mRNA that would end with a specific pattern ( say CCGCAT ).

is it possible to find this in a RNA-SEQ data ? (e.g: https://www.ncbi.nlm.nih.gov//sra?term=SRR059132 ) ? elsewhere ?

My idea would be to map the RNASeq data, use stringtie and then convert the GTF to fasta.

Is there a better / faster way ?

UPDATE: is it possible to know if a given transcript is poly-A+ or poly-A- from RNASeq data ?

rnaseq poly-a • 2.1k views
ADD COMMENT
1
Entering edit mode

Why not look for that pattern in Ensembl transcriptome? Or do you need to find it from sequence data?

ADD REPLY
0
Entering edit mode

that's a good suggestion, we already looked at ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh37.73.ncrna.fa.gz in 2013. There were many mRNAs with the pattern but at the end my colleague checked and they were all errors :-( (I think he checked manually/wetlab)

ADD REPLY
1
Entering edit mode

GENCODE one is most updated. You also have the option of looking in MANE set of data that has one representative for each gene.

ADD REPLY
0
Entering edit mode

thanks but it looks like those are poly-A+ data isn't it ?

ADD REPLY
0
Entering edit mode

oh, there is gencode.v38.lncRNA_transcripts.fa.gz in the directory.

ADD REPLY
1
Entering edit mode

In addition RNACentral has a whole bunch of other non-coding RNA's. Look in the by-database directories or parse out human ones from big file.

ADD REPLY
0
Entering edit mode

You can't assume that lncRNAs are non-polyA. Plenty of non-coding RNAs have polyA tails.

ADD REPLY
1
Entering edit mode

For poly(A)+ transcripts you'd find an A-rich hexamer (the polyadenylation signal) ~10bp in front of the poly(A) tail see here.

But I don't know if poly(A)- mRNA miss this hexamer.

ADD REPLY
2
Entering edit mode
2.6 years ago

I think the short answer is no, its not possible to distinugish poly-A(+) and poly-A(-) transcripts from normal total RNA-seq. Nor can you rely on things like lncRNAs to be non-polyA.

You could try a range of different lines of evidence to converge on a set of things you think are probably non-polyA.

You could start with the matched poly-A(+) and poly-A(-) data sets from ENCODE. If you do transcript specific quantification over something like GENOCDE or RNAcentral, and look for things with differential abundance between the + and - datasets.

You could then cross reference with polyA-seq data that specifically identifes polyA cleavage sites genome wide, and look for transcripst that don't have any signal, yet are highly expressed in the cell type in question.

Finally you could filter to only transcripts that didn't have a poly-A signal.

ADD COMMENT
2
Entering edit mode
2.6 years ago

ok, in the end I wrote a tool to find poly-A in RNASeq data. It gives me a suspicion about the poly-A-minus/plus state of a transcript.

ADD COMMENT

Login before adding your answer.

Traffic: 2859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6