Estimating SAM file size from fastq
1
1
Entering edit mode
7.0 years ago
13en ▴ 90

I'm using Bowtie2 to align some simulated E coli reads, on my laptop so it's taking a while, and the only measure of progress is the increasing size of the sam file Bowtie is writing. Unfortunately, I don't know how big that sam file should eventually get -- is there any rough guide to estimating the size of a sam file based on the size of the input fastq files? Is it nice and simple and they should be roughly the same size, or will it vary depending on the quality of the alignment?

alignment • 6.1k views
3
Entering edit mode
7.0 years ago
thackl ★ 2.9k

Estimation by size is difficult, because reads without mapping will only contribute very little. But you can do something like this:

ID=tail -n1 .sam | cut -f1  # gives you the latest mapped read id
LINE=grep -n "^@$ID" .fastq | cut -f1 -d: # last id line (use > for fasta) TOTAL=cat .fastq | wc -l # total lines echo "scale=5;$LINE/$TOTAL * 100" | bc # percentage done  the syntax should work now.  bc scale. ADD COMMENT 1 Entering edit mode Very nice, thanks. I'll hang on to that little script, might be very handy. Took me a bit of time to get the percentages working though, ended up having to alter the line to echo "scale=2;$LINE/\$TOTAL * 100" | bc

otherwise every division was returning zero.

1
Entering edit mode

Ah, yes, sorry - bc is set in my environment by default to use digits. I didn't think about that.