Question: How to increase speed of BBMap's `reformat.sh` when randomly sampling reads?
0
gravatar for O.rka
22 days ago by
O.rka130
O.rka130 wrote:

My current command is the following:

reformat.sh in=./ERR1701760.fastq out=stdout.fastq overwrite=t samplereadstarget=10000 sampleseed=0 > x.fq

How can I increase the speed of this program for sampling reads? The reason I'm using reformat.sh is because sometimes I have interleaved reads and I'll need the `R1.

Are there any parameters I can adjust? I know I can just take the first 10000 reads which will be much faster but I want to be able to use different random seeds here.

There are 109,100,547 (single ended?) reads in ERR1701760.fastq. I thought originally that these were paired end since these are HISEQ but I feel like I only downloaded the forward reads.

https://www.ebi.ac.uk/ena/data/view/ERR1701760&display=html

(base) -bash-4.1$ head ERR1701760.fastq
@ERR1701760.1 1 length=143
TTACGATTTGCCCAAAAGTCTTTCCCCCGTGTATCATCTCGGAACAGGATACCCACCTTGCCACTGTCGATTACGTCATTATCTTTCATGACGTTGTCGGACTAGCCGAAAAAAACCTAATTAAGAACANTTCAAGTTTCGGC
+ERR1701760.1 1 length=143
@@@FFBDDHFHHHEBD@FCHHIFIIGIGGFHEHHGADHHIIEFGHHICGGHHHIIIIIEEHGGHF@@EB(5@BEB?A=?CDDCC;ACCD>CCBB?BBCBB@@FFDEFHFHHDHGIIIJIIIJJIIIJJG#0?FHGIFHGGIGI
@ERR1701760.2 2 length=151
AAATATGTGGATCTGTTCGCTGCCAGTGCCATATTTTGTAAGCGTGGGATTGCACAATGTGGTCGTAACGTTGGTACGGTACAACAAGATTGAGCTGTCCGCAAACATGGGAATCTCCAGAATCTCACAAANTATTGTTCTCCATATTATC
+ERR1701760.2 2 length=151
CCCFFFFDHHHHHJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJIIJJIJJIJJJJJJJJJJJJIHHHHFFFEDDDEDDDDDDDDDDDDDDDDDDDDDDDD?CCCFFFFFHHHHHJJJJJJJJJJJJJJJIJ#1@FHIJIIJJJJJJJJIJH
@ERR1701760.3 3 length=156
CACCGACATCCACACGTGCATTCCTCCCGAGACGGACACGTGACGGCAGGCAAGGCCGCGGAAAGGGAAGAATGCGTGGGAGGGAAAGGCCGCGGCGAAGGAAGGTCGCCCTGGTTCGTATGTTTCCTTTGGATATAGATCTTCTCCTCCTCCAAC
sequencing rna-seq bbmap fastq • 107 views
ADD COMMENTlink modified 21 days ago • written 22 days ago by O.rka130
1

I don't think you can do anything in terms of program options to speed things up.

Why is speed an issue BTW? Most BBMap suite programs are plenty fast even when single threaded (like reformat.sh is).

ADD REPLYlink modified 21 days ago • written 21 days ago by genomax75k
1

These are indeed paired-end reads. Looks like you only got the forward read.

ADD REPLYlink written 21 days ago by genomax75k

Ok, that's what I thought too regarding the reads. I figured out one way to make it quicker while still being able to randomly sample the reads. I basically just did reads=int(1.618*number_of_subsampled_reads) so I can subsample from a smaller read set but still be able to use random seeds.

ADD REPLYlink written 21 days ago by O.rka130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 993 users visited in the last hour