8.8 years ago by
Netherlands
Hi, lets start using shell
###remove header and save the sam file head
sed 1,23d file.sam > file_noHeader.sam
head -n 23 file.sam > head
###randomly select one Million reads and save them (I took the one liner from here: http://www.unix.com/shell-programming-scripting/68686-random-lines-selection-form-file.html)
awk 'BEGIN {srand()} {printf "%05.0f %s \n",rand()*9999999, $0; }' file_noHeader.sam | sort -n | head - 1000000| sed 's/^[0-9]* //' > randomReads.tmp
###join the header back to have randomly sampled million reads subset of original file
cat head randomReads.tmp > randomReads.sam
###remove the tmp files
rm file_noHeader.sam randomReads.tmp
Ofcourse it can be more efficient and automated using pipes.
Also, you can save the script in a file, and replace the file name with $1
, like
awk 'BEGIN {srand()} {printf "%05.0f %s \n",rand()*9999999, $0; }' $1 | sort -n | head - 1000000| sed 's/^[0-9]* //' > $1.tmp
and then call it like sh rand.sh file.sam
, it will produce file.tmp
Cheers