Question: Estimating SAM file size from fastq
1
gravatar for 13en
4.7 years ago by
13en90
United Kingdom
13en90 wrote:

I'm using Bowtie2 to align some simulated E coli reads, on my laptop so it's taking a while, and the only measure of progress is the increasing size of the sam file Bowtie is writing. Unfortunately, I don't know how big that sam file should eventually get -- is there any rough guide to estimating the size of a sam file based on the size of the input fastq files? Is it nice and simple and they should be roughly the same size, or will it vary depending on the quality of the alignment?

alignment • 4.6k views
ADD COMMENTlink modified 4.7 years ago by thackl2.8k • written 4.7 years ago by 13en90
3
gravatar for thackl
4.7 years ago by
thackl2.8k
MIT
thackl2.8k wrote:

Estimation by size is difficult, because reads without mapping will only contribute very little. But you can do something like this:

ID=`tail -n1 .sam | cut -f1`  # gives you the latest mapped read id
LINE=`grep -n "^@$ID" .fastq | cut -f1 -d:`  
   # last id line (use > for fasta)
TOTAL=`cat .fastq | wc -l` # total lines
echo "scale=5; $LINE/$TOTAL * 100" | bc  # percentage done

[edit] the syntax should work now.
[edit] bc scale.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by thackl2.8k
1

Very nice, thanks. I'll hang on to that little script, might be very handy.

Took me a bit of time to get the percentages working though, ended up having to alter the line to 

echo "scale=2; $LINE/$TOTAL * 100" | bc

otherwise every division was returning zero.

ADD REPLYlink written 4.7 years ago by 13en90
1

Ah, yes, sorry - bc is set in my environment by default to use digits. I didn't think about that.
 

ADD REPLYlink written 4.7 years ago by thackl2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1417 users visited in the last hour