My data is 75 bp single end. Libraries were prepared using the QuantSeq 3' FWD kit and sequenced using the Illumina NextSeq 500.
I am trying to remove Poly-A tails using Trimmomatic V0.32. Is there anyone know how could I make it. Thanks very much.
You could easily do this using bbduk.sh from BBMap suite with literal=AAAAAA option.
For trimmomatic you could add a poly-A sequence line in your adapters file.
Thanks for your answer. I've never used bbduk.sh before. Since different reads have different number of "A"s, is the "literal=AAAAAA" option is specific for 6 "A"s or a general option for multiple "A"s?
Same concern at Trimmomatic, does a poly-A sequence line define a specific number of "A"s or represent multiple "A"s?
Once bbduk finds a stretch of A's then everything to the right will be removed if you are using ktrim=r (trim to the right after the k-mer match) option.
bbduk.sh -Xmx512m in=Test/WT12_4_AGTACT_R1.fastq.gz out=Test/WT12_4_C.fastq.gz literal=AAAAAA ktrim=r k=23 mink=11 hdist=1
maskMiddle was disabled because useShortKmers=true
Memory: max=514m, free=488m, used=26m
Added 0 kmers; time: 0.012 seconds.
Memory: max=514m, free=469m, used=45m
****** WARNING! A KMER OPERATION WAS CHOSEN BUT NO KMERS WERE LOADED. ******
****** PLEASE ENSURE K IS LESS THAN OR EQUAL TO REF SEQUENCE LENGTHS. ******
Input is being processed as unpaired
Started output streams: 0.077 seconds.
Processing time: 8.404 seconds.
Input: 1948334 reads 167556724 bases.
KTrimmed: 0 reads (0.00%) 0 bases (0.00%)
Result: 1948334 reads (100.00%) 167556724 bases (100.00%)
Time: 8.500 seconds.
Reads Processed: 1948k 229.20k reads/sec
Bases Processed: 167m 19.71m bases/sec
I checked my reads and there was no very short reads less than K, but why no kmers were loaded?
Can you post a couple of example reads that contain the poly-A?
Hopefully those are not that actual reads since they seem to be missing Q scores. As long as your reads are in the right fastq format following will work. Adjust the length of A's so you get them all.
bbduk.sh in=your_file.fq out=clean.fq literal=AAAAAAAAAAA k=7 ktrim=r
Complete Fastq format
$ bbduk.sh in=a.fq out=stdout.fq literal=AAAAAAAAAAA k=7 ktrim=r
Will result in this (a.fq contains your sequence)
Sorry for the confusion.
Since you point out in last message that they lacked q-scores, so I resent the complete fastq format.
Thanks for your code, and it worked!
But I want to know more about WHY this worked. Why you reduced the k to 7 and didn't specify the mink value? What is the problem of the code in bbduk.sh documentation?
Thanks so much!
BBDuk documentation refers to scanning for regular Illumina adapters (which are diverse in sequence and are long). So for those a longer value of k is appropriate. In your case we are looking for a stretch of A's so I suggested a smaller value of k which allows min 7 A's and above to be found. You can find in-line help for bbduk.sh useful. Just run bbduk.sh without any options and it will be printed to screen. For most purposes default values of parameters (even if we don't change them they are in use) are fine.
That helped. Thank you so much.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy