Question: Poly A trimming from RNA-seq data by bbduk (bbmap package)
Hi all friends,

For trimming polyA tail from RNA-seq data using bbduk, I found two flags: "trimpolya=10”, which trim leading or trailing sequences of at least 10 A or T and “literal=AAAAA” along with adjusting the value of k= as needed. I tried “trimpolya=10”, but faced the error, seemingly, this flag is not known for the software. Regarding the second flag, “literal=AAAAA”, I’m in a doubt a bit if it should be “literal=TTTT”, or not, please kindly clear me. Could you please also tell me what is your suggestion for k value for this trimming?

Thank you

Depending on what strand was sequenced it may need to be literal=TTTT. What do you see in your sequences? BTW: trimpolya=N is a valid command option.

Thanks. Sorry, how to find out which strand was sequenced? data obtained by Illumina TruSeq™ RNA Sample Preparation Kit. I just see a part of sequencing reads and don't see AAAA or TTT. However, mRNA was purified from total RNA using poly-T oligo-linked magnetic beads, so there is a probable AAA/TTT contamination. Would you please tell me what is the difference between two commands, trimpolya=N and literal=TTTT? However, after trying trimpolya=N, the below error appeared:

BBDuk version 37.17
Exception in thread "main" java.lang.RuntimeException: Unknown parameter trimpolya=N
    at jgi.BBDukF.<init>(
    at jgi.BBDukF.main(

Could you please help me out with this issue?


Any suggestions, please!

I must have missed your last post. You are using a fairly old version of BBMap. So I suggest that you upgrade to the latest first.

With trimpoly= you need to replace N with a number you want. With literal=TTTT the smallest stretch of T's that bbduk will identify will be 4. Depending on what you are doing (kmask= or ktrim=) sequences will be masked or trimmed.

