Are polyA tails filtered at some step of tophat/cufflinks processing or this information is retained?
2
3
Entering edit mode
9.7 years ago
trakhtenberg ▴ 160

I need to check which of the predicted de novo transcripts have polyA at 3'. Are polyA tails filtered at some step of tophat/cufflinks processing or this information is retained? If this info is retained, where do I find it. thank you

RNA-Seq • 3.7k views
ADD COMMENT
2
Entering edit mode
9.7 years ago

It's unclear what TopHat/Cufflinks has to do with the question as you are talking about de novo assembled transcripts. I would just start by using the grep command to get a feeling for whether there are any poly-A stretches. It is easy enough to write a script (Perl/Python/whatever) to look for them as well.

Often there are very very few reads with poly-A in HiSeq RNA-seq data, so you might not find much. For some reason there seem to be more poly-A tails left in MiSeq-produced RNA-seq data.

ADD COMMENT
0
Entering edit mode

thank you for the feedback. your point and the point made by Jeremy shortly after are complementary, so I add a comment to both points under the latter post. thank you.

ADD REPLY
2
Entering edit mode
9.7 years ago

Most reads from bridge amplification wouldn't contain polyA's nor would they ever make it past alignment. I think you are confusing de novo transcriptome assembly with ab initio isoform discovery.

ADD COMMENT
0
Entering edit mode

Yes, I did see a lot of polyA stretched using grep, and also TransDecoder predicted ORFs within some predicted de-novo transcripts.

Yes, software like Trinity that does ab initio transcript assembly is known to retain the polyA tails. But Cufflinks also predicts new transcripts with de-novo exons in intergenic regions, those with 'u' class code. So, I was trying to understand what happens when polyA tail is encountered? If its just discarded as unnamable, then all the reads containing 3' of novel and known transcripts with polyA tail would be in the discarded bin? If my assumptions are correct, what would be the best way to utilize these discarded reads for determining whether or not the novel transcripts predicted by Cufflinks have polyA tails?

I considered ab initio like Trinity, but then this would create issues in how to combine both approaches into a single paper, as I am sure there would be quite a lot of differences between their outputs, including the properties and number of predicted novel transcripts. Would appreciate advice. Thank you.

ADD REPLY
1
Entering edit mode

Tophat uses Bowtie to align as many reads as it can to the reference genome in an unspliced manner. It rummages through any reads that didn't align and sees if they span two contigs (exons) formed by the Bowtie alignment, then adds these to the SAM file in a spliced format. In the case of a reads with a bunch of foreign A's at the end that do not add spanning information - it's unclear if Tophat would really rescue these from the bin.

In this paper the authors manually identified and rescued polyA reads: http://www.biomedcentral.com/1471-2164/11/711

Another alignment tool I would suggest you look at is STAR. STAR replaces Bowtie/Tophat with a fast sensitive spliced-aligner. It explicitly mentions polyA tails:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/

ADD REPLY
0
Entering edit mode

I opened a separate post on how to use STAR/Cufflinks for dignifying polyA tails for predicted novel transcripts: Identifying polyA tail sequences for predicted novel transcripts using STAR/Cufflinks

thank you

ADD REPLY

Login before adding your answer.

Traffic: 2955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6