BBmap bbduk.sh for filtering reads
1
0
Entering edit mode
20 months ago
kinase • 0

I'm looking to filter reads that contain a stretch of A's, I found these posts looking for polyA tails, meaning this should work all the same (Identify RNA-seq reads containing polyA sequence, Identifying RNA-seq reads containing polyA stretch). However, I cannot get it to work. Given just these two reads, the first of which does contain a stretch of 8 A's and the second of which does not, what is the correct command to separate the first from the second?

----------

@A01587:190:GW220612000:1:2101:6090:1016 1:N:0:AATCCAGC+CTAGTCGA
NATTTGAATTCAGAACCGCTTCTGCTCAATTAGAAGGTGGTGTCCATAATTTGCACTCCTATGAAAAACGTCTATACAATTGAGTAAGCATCCATAGATATTTAAAAGTTTATTTTTGCATATAAATATACCTTCATAGAGATCAACAAAACTAAAATAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTATCTGTTATTATTTGATGCTTTAACCAAAGGATATTGGACCAAGTCAAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF:FFFF:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFF,:,,,,::,,,,F,:,FF,::,F:,:::,::F,,,,:FF,,,F,F:,,F,FF
@A01587:190:GW220612000:1:2101:27850:1031 1:N:0:AATCCAGC+CTAGTCGA
CTAAAGCCTTTTTGTAATCCAATGGAGCAGCACTCATCGTAAAATGTTTGTTAGGTTTCTTCAAAGCTATGGAATTGGTCCAATATCCTTTGGTTCAAGCAGCAAAGAAGACCAGAGAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAATATTATATTTTTATTTTTTTTTTTTTGTTTATAAATTTTTTTTTGTTTTTGTATTTTGTTTGATAATTGT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF,F:,:,:,F,::,,,,,FF,F,,,:,,,,,,,,,,,,,,,,,,,,:,F,:,::,,FF::,:,,,,,,,,:,:,,,,,,,

My expectation was that this would work but it just sends both reads to a file

bbduk.sh -in=in.fq.gz -outm=outm.fq.gz -literal=AAAAAAAA

Other non-functioning attempts include

bbduk.sh -in=in.fq.gz -out=out.fq.gz -outm=outm.fq.gz -literal=AAAAAAAA
bbduk.sh -in=in.fq.gz -out=out.fq.gz -outm=outm.fq.gz -literal=AAAAAAAA -k=3
bbduk.sh -in=in.fq.gz -out=out.fq.gz -literal=AAAAAAAA
bbduk.sh -in=in.fq.gz -outm=outm.fq.gz -literal=AAAAAAAA -k=3

I suppose another option is to use cutadapt and tell it the adapter is a bunch of A's and simply pass those reads to a file, but I'd still like to know what I'm doing wrong here.

RNAseq BBmap • 875 views
ADD COMMENT
3
Entering edit mode
20 months ago
GenoMax 141k

There is no - in front of BBtools options. So the command would be following (turn off rev complementing to not match polyT's).

$ bbduk.sh -Xmx2g in=test.fq outm=stdout.fq literal=AAAAAAAA k=4 rcomp=f

0.022 seconds.
Initial:
Memory: max=2058m, total=2058m, free=2015m, used=43m

Added 4 kmers; time:    0.003 seconds.
Memory: max=2058m, total=2058m, free=2004m, used=54m

Input is being processed as unpaired
Started output streams: 0.017 seconds.
@A01587:190:GW220612000:1:2101:6090:1016 1:N:0:AATCCAGC+CTAGTCGA
NATTTGAATTCAGAACCGCTTCTGCTCAATTAGAAGGTGGTGTCCATAATTTGCACTCCTATGAAAAACGTCTATACAATTGAGTAAGCATCCATAGATATTTAAAAGTTTATTTTTGCATATAAATATACCTTCATAGAGATCAACAAAACTAAAATAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTATCTGTTATTATTTGATGCTTTAACCAAAGGATATTGGACCAAGTCAAT
+
!FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF:FFFF:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFF,:,,,,::,,,,F,:,FF,::,F:,:::,::F,,,,:FF,,,F,F:,,F,FF
Processing time:                0.005 seconds.

Input:                          2 reads                 499 bases.
Contaminants:                   1 reads (50.00%)        249 bases (49.90%)
Total Removed:                  1 reads (50.00%)        249 bases (49.90%)
Result:                         1 reads (50.00%)        250 bases (50.10%)

Time:                           0.025 seconds.
Reads Processed:           2    0.08k reads/sec
Bases Processed:         499    0.02m bases/sec
ADD COMMENT
0
Entering edit mode

Oh boy, that's a dumb mistake on my part. Thank you for pointing that out.

ADD REPLY

Login before adding your answer.

Traffic: 1619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6