I'm looking to filter reads that contain a stretch of A's, I found these posts looking for polyA tails, meaning this should work all the same (Identify RNA-seq reads containing polyA sequence, Identifying RNA-seq reads containing polyA stretch). However, I cannot get it to work. Given just these two reads, the first of which does contain a stretch of 8 A's and the second of which does not, what is the correct command to separate the first from the second?
----------
@A01587:190:GW220612000:1:2101:6090:1016 1:N:0:AATCCAGC+CTAGTCGA
NATTTGAATTCAGAACCGCTTCTGCTCAATTAGAAGGTGGTGTCCATAATTTGCACTCCTATGAAAAACGTCTATACAATTGAGTAAGCATCCATAGATATTTAAAAGTTTATTTTTGCATATAAATATACCTTCATAGAGATCAACAAAACTAAAATAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATTATCTGTTATTATTTGATGCTTTAACCAAAGGATATTGGACCAAGTCAAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF:FFFF:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFF,:,,,,::,,,,F,:,FF,::,F:,:::,::F,,,,:FF,,,F,F:,,F,FF
@A01587:190:GW220612000:1:2101:27850:1031 1:N:0:AATCCAGC+CTAGTCGA
CTAAAGCCTTTTTGTAATCCAATGGAGCAGCACTCATCGTAAAATGTTTGTTAGGTTTCTTCAAAGCTATGGAATTGGTCCAATATCCTTTGGTTCAAGCAGCAAAGAAGACCAGAGAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAATATTATATTTTTATTTTTTTTTTTTTGTTTATAAATTTTTTTTTGTTTTTGTATTTTGTTTGATAATTGT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF,F:,:,:,F,::,,,,,FF,F,,,:,,,,,,,,,,,,,,,,,,,,:,F,:,::,,FF::,:,,,,,,,,:,:,,,,,,,
My expectation was that this would work but it just sends both reads to a file
bbduk.sh -in=in.fq.gz -outm=outm.fq.gz -literal=AAAAAAAA
Other non-functioning attempts include
bbduk.sh -in=in.fq.gz -out=out.fq.gz -outm=outm.fq.gz -literal=AAAAAAAA
bbduk.sh -in=in.fq.gz -out=out.fq.gz -outm=outm.fq.gz -literal=AAAAAAAA -k=3
bbduk.sh -in=in.fq.gz -out=out.fq.gz -literal=AAAAAAAA
bbduk.sh -in=in.fq.gz -outm=outm.fq.gz -literal=AAAAAAAA -k=3
I suppose another option is to use cutadapt and tell it the adapter is a bunch of A's and simply pass those reads to a file, but I'd still like to know what I'm doing wrong here.
Oh boy, that's a dumb mistake on my part. Thank you for pointing that out.