How to remove Poly-A tails from from QuantSeq 3' FWD data using Trimmomatic
1
1
Entering edit mode
4.6 years ago
jerrywu1987 ▴ 10

My data is 75 bp single end. Libraries were prepared using the QuantSeq 3' FWD kit and sequenced using the Illumina NextSeq 500.

I am trying to remove Poly-A tails using Trimmomatic V0.32. Is there anyone know how could I make it. Thanks very much.

Trimmomatic Poly-A tails QuantSeq • 4.1k views
2
Entering edit mode
4.6 years ago
GenoMax 119k

You could easily do this using bbduk.sh from BBMap suite with literal=AAAAAA option.

For trimmomatic you could add a poly-A sequence line in your adapters file.

0
Entering edit mode

Thanks for your answer. I've never used bbduk.sh before. Since different reads have different number of "A"s, is the "literal=AAAAAA" option is specific for 6 "A"s or a general option for multiple "A"s?

Same concern at Trimmomatic, does a poly-A sequence line define a specific number of "A"s or represent multiple "A"s?

Thanks.

0
Entering edit mode

Once bbduk finds a stretch of A's then everything to the right will be removed if you are using ktrim=r (trim to the right after the k-mer match) option.

0
Entering edit mode
bbduk.sh  -Xmx512m  in=Test/WT12_4_AGTACT_R1.fastq.gz out=Test/WT12_4_C.fastq.gz literal=AAAAAA ktrim=r k=23 mink=11 hdist=1

Initial:
Memory: max=514m, free=488m, used=26m

Added 0 kmers; time:    0.012 seconds.
Memory: max=514m, free=469m, used=45m

******  WARNING! A KMER OPERATION WAS CHOSEN BUT NO KMERS WERE LOADED.  ******
******  PLEASE ENSURE K IS LESS THAN OR EQUAL TO REF SEQUENCE LENGTHS.  ******

Input is being processed as unpaired
Started output streams: 0.077 seconds.
Processing time:                8.404 seconds.

KTrimmed:                       0 reads (0.00%)         0 bases (0.00%)
Result:                         1948334 reads (100.00%)         167556724 bases (100.00%)

Time:                           8.500 seconds.
Bases Processed:        167m    19.71m bases/sec


I checked my reads and there was no very short reads less than K, but why no kmers were loaded?

Thanks

0
Entering edit mode

Can you post a couple of example reads that contain the poly-A?

0
Entering edit mode

@NB551191:77:H33JGBGX5:1:11101:16098:1073 1:N:0:CTGCGT GATATTTGTTGTTTTGTAAGTGTATGTATATACTCGTACGTTGAAATTTGAATTCATATGCAAAAAAAAAAAGAAAAAAAAAAAAA

@NB551191:77:H33JGBGX5:1:11101:15600:1235 1:N:0:CAGCGT ATGTTATCGCGGCTACTGGCAAACCTTAAGTGATACGGTATTCTTCTTTTCGGCAAAAAAAAAAAAAAAAAAAAGATCGGAATAGC

@NB551191:77:H33JGBGX5:1:11101:14926:1307 1:N:0:CAGCGT TTGATGCTACTATGCTGTACTCAGGATTCCATGCTGCATTGCGATGCTAAATTAAAGAACCTCTGTTACCTTAAAAAAAAAAAAAA

0
Entering edit mode

Hopefully those are not that actual reads since they seem to be missing Q scores. As long as your reads are in the right fastq format following will work. Adjust the length of A's so you get them all.

bbduk.sh in=your_file.fq out=clean.fq literal=AAAAAAAAAAA k=7 ktrim=r

0
Entering edit mode
    @NB551191:77:H33JGBGX5:1:11101:16098:1073 1:N:0:CTGCGT
GATATTTGTTGTTTTGTAAGTGTATGTATATACTCGTACGTTGAAATTTGAATTCATATGCAAAAAAAAAAAGAAAAAAAAAAAAA
+
/AAAAEEAAEE/EEEEEEEEE/EEEEEEEEEEEEEEEEEE<<AAAE<A6/EEA6EEEEEEAEEEEEEEE<EEEEEEEEEAEEEAEE

@NB551191:77:H33JGBGX5:1:11101:15963:1096 1:N:0:CAGCGT
TGTGCCGGTCTAATGTAGTTTGTTCTGTATCTTCGTTTCGAGGTGCTCCAGTTTCTAGTCAAAAAAAAAGAAAAAAAAAAAAAAAA
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEAEEEEEEEEEEAEEEEEEEEEEEEE<AAEEE//AEEEEEAEEEEEE


Complete Fastq format

0
Entering edit mode
\$ bbduk.sh in=a.fq out=stdout.fq literal=AAAAAAAAAAA k=7 ktrim=r


Will result in this (a.fq contains your sequence)

@NB551191:77:H33JGBGX5:1:11101:16098:1073 1:N:0:CTGCGT
GATATTTGTTGTTTTGTAAGTGTATGTATATACTCGTACGTTGAAATTTGAATTCATATGC
+
/AAAAEEAAEE/EEEEEEEEE/EEEEEEEEEEEEEEEEEE<<AAAE<A6/EEA6EEEEEEA
@NB551191:77:H33JGBGX5:1:11101:15963:1096 1:N:0:CAGCGT
TGTGCCGGTCTAATGTAGTTTGTTCTGTATCTTCGTTTCGAGGTGCTCCAGTTTCTAGTC
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEAEEEEEEEEEEAEEEEEEEE

0
Entering edit mode

Sorry for the confusion.

Since you point out in last message that they lacked q-scores, so I resent the complete fastq format.

Thanks for your code, and it worked!

But I want to know more about WHY this worked. Why you reduced the k to 7 and didn't specify the mink value? What is the problem of the code in bbduk.sh documentation?

Thanks so much!

0
Entering edit mode

BBDuk documentation refers to scanning for regular Illumina adapters (which are diverse in sequence and are long). So for those a longer value of k is appropriate. In your case we are looking for a stretch of A's so I suggested a smaller value of k which allows min 7 A's and above to be found. You can find in-line help for bbduk.sh useful. Just run bbduk.sh without any options and it will be printed to screen. For most purposes default values of parameters (even if we don't change them they are in use) are fine.

0
Entering edit mode

That helped. Thank you so much.