How to increase speed of BBMap's `reformat.sh` when randomly sampling reads?
0
0
Entering edit mode
4.4 years ago
O.rka ▴ 710

My current command is the following:

reformat.sh in=./ERR1701760.fastq out=stdout.fastq overwrite=t samplereadstarget=10000 sampleseed=0 > x.fq

How can I increase the speed of this program for sampling reads? The reason I'm using reformat.sh is because sometimes I have interleaved reads and I'll need the `R1.

Are there any parameters I can adjust? I know I can just take the first 10000 reads which will be much faster but I want to be able to use different random seeds here.

There are 109,100,547 (single ended?) reads in ERR1701760.fastq. I thought originally that these were paired end since these are HISEQ but I feel like I only downloaded the forward reads.

https://www.ebi.ac.uk/ena/data/view/ERR1701760&display=html

(base) -bash-4.1$ head ERR1701760.fastq
@ERR1701760.1 1 length=143
TTACGATTTGCCCAAAAGTCTTTCCCCCGTGTATCATCTCGGAACAGGATACCCACCTTGCCACTGTCGATTACGTCATTATCTTTCATGACGTTGTCGGACTAGCCGAAAAAAACCTAATTAAGAACANTTCAAGTTTCGGC
+ERR1701760.1 1 length=143
@@@FFBDDHFHHHEBD@FCHHIFIIGIGGFHEHHGADHHIIEFGHHICGGHHHIIIIIEEHGGHF@@EB(5@BEB?A=?CDDCC;ACCD>CCBB?BBCBB@@FFDEFHFHHDHGIIIJIIIJJIIIJJG#0?FHGIFHGGIGI
@ERR1701760.2 2 length=151
AAATATGTGGATCTGTTCGCTGCCAGTGCCATATTTTGTAAGCGTGGGATTGCACAATGTGGTCGTAACGTTGGTACGGTACAACAAGATTGAGCTGTCCGCAAACATGGGAATCTCCAGAATCTCACAAANTATTGTTCTCCATATTATC
+ERR1701760.2 2 length=151
CCCFFFFDHHHHHJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJIIJJIJJIJJJJJJJJJJJJIHHHHFFFEDDDEDDDDDDDDDDDDDDDDDDDDDDDD?CCCFFFFFHHHHHJJJJJJJJJJJJJJJIJ#1@FHIJIIJJJJJJJJIJH
@ERR1701760.3 3 length=156
CACCGACATCCACACGTGCATTCCTCCCGAGACGGACACGTGACGGCAGGCAAGGCCGCGGAAAGGGAAGAATGCGTGGGAGGGAAAGGCCGCGGCGAAGGAAGGTCGCCCTGGTTCGTATGTTTCCTTTGGATATAGATCTTCTCCTCCTCCAAC
RNA-Seq sequencing bbmap fastq • 1.3k views
ADD COMMENT
1
Entering edit mode

I don't think you can do anything in terms of program options to speed things up.

Why is speed an issue BTW? Most BBMap suite programs are plenty fast even when single threaded (like reformat.sh is).

ADD REPLY
1
Entering edit mode

These are indeed paired-end reads. Looks like you only got the forward read.

ADD REPLY
0
Entering edit mode

Ok, that's what I thought too regarding the reads. I figured out one way to make it quicker while still being able to randomly sample the reads. I basically just did reads=int(1.618*number_of_subsampled_reads) so I can subsample from a smaller read set but still be able to use random seeds.

ADD REPLY

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6