Question: How to increase speed of BBMap's `reformat.sh` when randomly sampling reads?
0
gravatar for O.rka
12 months ago by
O.rka220
O.rka220 wrote:

My current command is the following:

reformat.sh in=./ERR1701760.fastq out=stdout.fastq overwrite=t samplereadstarget=10000 sampleseed=0 > x.fq

How can I increase the speed of this program for sampling reads? The reason I'm using reformat.sh is because sometimes I have interleaved reads and I'll need the `R1.

Are there any parameters I can adjust? I know I can just take the first 10000 reads which will be much faster but I want to be able to use different random seeds here.

There are 109,100,547 (single ended?) reads in ERR1701760.fastq. I thought originally that these were paired end since these are HISEQ but I feel like I only downloaded the forward reads.

https://www.ebi.ac.uk/ena/data/view/ERR1701760&display=html

(base) -bash-4.1$ head ERR1701760.fastq
@ERR1701760.1 1 length=143
TTACGATTTGCCCAAAAGTCTTTCCCCCGTGTATCATCTCGGAACAGGATACCCACCTTGCCACTGTCGATTACGTCATTATCTTTCATGACGTTGTCGGACTAGCCGAAAAAAACCTAATTAAGAACANTTCAAGTTTCGGC
+ERR1701760.1 1 length=143
@@@FFBDDHFHHHEBD@FCHHIFIIGIGGFHEHHGADHHIIEFGHHICGGHHHIIIIIEEHGGHF@@EB(5@BEB?A=?CDDCC;ACCD>CCBB?BBCBB@@FFDEFHFHHDHGIIIJIIIJJIIIJJG#0?FHGIFHGGIGI
@ERR1701760.2 2 length=151
AAATATGTGGATCTGTTCGCTGCCAGTGCCATATTTTGTAAGCGTGGGATTGCACAATGTGGTCGTAACGTTGGTACGGTACAACAAGATTGAGCTGTCCGCAAACATGGGAATCTCCAGAATCTCACAAANTATTGTTCTCCATATTATC
+ERR1701760.2 2 length=151
CCCFFFFDHHHHHJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJIIJJIJJIJJJJJJJJJJJJIHHHHFFFEDDDEDDDDDDDDDDDDDDDDDDDDDDDD?CCCFFFFFHHHHHJJJJJJJJJJJJJJJIJ#1@FHIJIIJJJJJJJJIJH
@ERR1701760.3 3 length=156
CACCGACATCCACACGTGCATTCCTCCCGAGACGGACACGTGACGGCAGGCAAGGCCGCGGAAAGGGAAGAATGCGTGGGAGGGAAAGGCCGCGGCGAAGGAAGGTCGCCCTGGTTCGTATGTTTCCTTTGGATATAGATCTTCTCCTCCTCCAAC
sequencing rna-seq bbmap fastq • 341 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by O.rka220
1

I don't think you can do anything in terms of program options to speed things up.

Why is speed an issue BTW? Most BBMap suite programs are plenty fast even when single threaded (like reformat.sh is).

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax92k
1

These are indeed paired-end reads. Looks like you only got the forward read.

ADD REPLYlink written 12 months ago by genomax92k

Ok, that's what I thought too regarding the reads. I figured out one way to make it quicker while still being able to randomly sample the reads. I basically just did reads=int(1.618*number_of_subsampled_reads) so I can subsample from a smaller read set but still be able to use random seeds.

ADD REPLYlink written 12 months ago by O.rka220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1672 users visited in the last hour