Best way to trim PolyA in RNA Seq reads?
1
2
Entering edit mode
3.7 years ago
DVA ▴ 600

Hello,

I am trying to learn what my options are if I would like to trim polyA from my reads. (FastqC repo see previous discussion: https://www.biostars.org/p/302411/#302507) We are quite certain we need to trim adapter, which we would use bbduk, but we also see polyA showing in many reads, and would like to trim that too.

Since polyA shows in different reads with different length, it does not make sense to trim by length. I hope to trim all the way till the nucleotide is not A. Any suggestions?

(Or maybe I should just go ahead with alignment (Tophat) and let the aligner ignore polyA in the reads?)

Thank you very much!

RNA-Seq • 3.3k views
1
Entering edit mode

(Or maybe I should just go ahead with alignment (Tophat) and let the aligner ignore polyA in the reads?)

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

0
Entering edit mode

Thank you so much! I was wondering about that recently lol and this confirms it. Appreciate it!

0
Entering edit mode

By the way, in your experience, do you think these aligners can handle 10-20bps polyA in the reads?

0
Entering edit mode

I usually trim my reads, but I don't know if it's necessary. Haven't evaluated that.

1
Entering edit mode

You can use bbduk.sh to remove poly-A tails as well. with literal=AAAAA and adjusting value of k= as needed to something small.

0
Entering edit mode

Thank you very much for following up with my questions. Can this be done in one command, together with trimming the adaptor? I assume not - because k for adaptor will be around 20, while this one should be something like 2-3 right?

1
Entering edit mode

You can use unix pipes for multiple bbduk.sh passes. Something like (example for SE reads, adjust for PE reads with in1= in2= etc):

bbduk.sh in=seq.fq out=stdout.fq ref=adapters.fa k=20 ktrim=r | bbduk.sh in=stdin.fq out=final.fq literal=AAAAA k=2 ktrim=r

0
Entering edit mode

Thank you. One more question - For adaptor trimming, is there a particular reason not to just do force-trimming? (Simply trim # number of bps from one end)

1
Entering edit mode

You could (forcetrimright= or forcetrimleft=) but the the problem is your adapter contamination may not always be in the same spot (unless you are doing some special library prep and expect it to be so).

0
Entering edit mode

I thought the adaptor is always showing up at the left of the reads... why they will not be at the same spot please? Sorry for all these questions, but I really want to understand it better. Thank you.

2
Entering edit mode

Adapter should always be towards the right end of the read. See this document for clarification on how the libraries are constructed. Since your insert size varies all fragments will not have the adapter in the same location (assuming you have short inserts and have run of real DNA to sequence).

2
Entering edit mode
3.7 years ago

You could use prinseq or fastp for trimming polyA tails

0
Entering edit mode

Thank you very much!