Estimating SAM file size from fastq
1
1
Entering edit mode
7.0 years ago
13en ▴ 90

I'm using Bowtie2 to align some simulated E coli reads, on my laptop so it's taking a while, and the only measure of progress is the increasing size of the sam file Bowtie is writing. Unfortunately, I don't know how big that sam file should eventually get -- is there any rough guide to estimating the size of a sam file based on the size of the input fastq files? Is it nice and simple and they should be roughly the same size, or will it vary depending on the quality of the alignment?

alignment • 6.1k views
ADD COMMENT
3
Entering edit mode
7.0 years ago
thackl ★ 2.9k

Estimation by size is difficult, because reads without mapping will only contribute very little. But you can do something like this:

ID=`tail -n1 .sam | cut -f1`  # gives you the latest mapped read id
LINE=`grep -n "^@$ID" .fastq | cut -f1 -d:`  
   # last id line (use > for fasta)
TOTAL=`cat .fastq | wc -l` # total lines
echo "scale=5; $LINE/$TOTAL * 100" | bc  # percentage done

[edit] the syntax should work now.
[edit] bc scale.

ADD COMMENT
1
Entering edit mode

Very nice, thanks. I'll hang on to that little script, might be very handy.

Took me a bit of time to get the percentages working though, ended up having to alter the line to 

echo "scale=2; $LINE/$TOTAL * 100" | bc

otherwise every division was returning zero.

ADD REPLY
1
Entering edit mode

Ah, yes, sorry - bc is set in my environment by default to use digits. I didn't think about that.
 

ADD REPLY

Login before adding your answer.

Traffic: 1824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6