Question

Best way to trim PolyA in RNA Seq reads?

2

Entering edit mode

6.1 years ago

DVA ▴ 630

Hello,

I am trying to learn what my options are if I would like to trim polyA from my reads. (FastqC repo see previous discussion: https://www.biostars.org/p/302411/#302507) We are quite certain we need to trim adapter, which we would use bbduk, but we also see polyA showing in many reads, and would like to trim that too.

Since polyA shows in different reads with different length, it does not make sense to trim by length. I hope to trim all the way till the nucleotide is not A. Any suggestions?

(Or maybe I should just go ahead with alignment (Tophat) and let the aligner ignore polyA in the reads?)

Thank you very much!

RNA-Seq • 5.3k views

ADD COMMENT • link updated 6.1 years ago by WouterDeCoster 47k • written 6.1 years ago by DVA ▴ 630

1

Entering edit mode

(Or maybe I should just go ahead with alignment (Tophat) and let the aligner ignore polyA in the reads?)

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

Please stop using Tophat https://t.co/Es4ohxOEyx Cole and I developed the method in *2008*. It was greatly improved in TopHat2 then HISAT & HISAT2. There is no reason to use it anymore. I have been saying this for years yet it has more citations this year than last #methodsmatter
— Lior Pachter (@lpachter) December 2, 2017

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you so much! I was wondering about that recently lol and this confirms it. Appreciate it!

ADD REPLY • link 6.1 years ago by DVA ▴ 630

0

Entering edit mode

By the way, in your experience, do you think these aligners can handle 10-20bps polyA in the reads?

ADD REPLY • link 6.1 years ago by DVA ▴ 630

0

Entering edit mode

I usually trim my reads, but I don't know if it's necessary. Haven't evaluated that.

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

1

Entering edit mode

You can use bbduk.sh to remove poly-A tails as well. with literal=AAAAA and adjusting value of k= as needed to something small.

ADD REPLY • link 6.1 years ago by GenoMax 141k

0

Entering edit mode

Thank you very much for following up with my questions. Can this be done in one command, together with trimming the adaptor? I assume not - because k for adaptor will be around 20, while this one should be something like 2-3 right?

ADD REPLY • link 6.1 years ago by DVA ▴ 630

1

Entering edit mode

You can use unix pipes for multiple bbduk.sh passes. Something like (example for SE reads, adjust for PE reads with in1= in2= etc):

bbduk.sh in=seq.fq out=stdout.fq ref=adapters.fa k=20 ktrim=r | bbduk.sh in=stdin.fq out=final.fq literal=AAAAA k=2 ktrim=r

ADD REPLY • link 6.1 years ago by GenoMax 141k

0

Entering edit mode

Thank you. One more question - For adaptor trimming, is there a particular reason not to just do force-trimming? (Simply trim # number of bps from one end)

ADD REPLY • link 6.1 years ago by DVA ▴ 630

1

Entering edit mode

You could (forcetrimright= or forcetrimleft=) but the the problem is your adapter contamination may not always be in the same spot (unless you are doing some special library prep and expect it to be so).

ADD REPLY • link 6.1 years ago by GenoMax 141k

0

Entering edit mode

I thought the adaptor is always showing up at the left of the reads... why they will not be at the same spot please? Sorry for all these questions, but I really want to understand it better. Thank you.

ADD REPLY • link 6.1 years ago by DVA ▴ 630

2

Entering edit mode

Adapter should always be towards the right end of the read. See this document for clarification on how the libraries are constructed. Since your insert size varies all fragments will not have the adapter in the same location (assuming you have short inserts and have run of real DNA to sequence).

ADD REPLY • link 6.1 years ago by GenoMax 141k

score 2 · Answer 1 · 2018-03-12

2

Entering edit mode

6.1 years ago

WouterDeCoster 47k

You could use prinseq or fastp for trimming polyA tails

ADD COMMENT • link 6.1 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you very much!

ADD REPLY • link 6.1 years ago by DVA ▴ 630