Mean read length from bam file
2
0
Entering edit mode
3 months ago
Jjbox ▴ 40

Hi biostars, I have a question about calculating the median read length from a bam file.

samtools view GTEX-1192X-0011-R10a-SM-DO941.bam | awk '{print length($10)}' | head -1000 | sort -u  Instead of the above command line, is it possible to get a median read length from a bam file? bam samtools • 801 views ADD COMMENT 1 Entering edit mode 3 months ago samtools view -F 2304 in.bam | awk '{print length($10);}' | datamash median 1

0
Entering edit mode

Awesome. Can you explain what is 2304 and datamash is?

1
Entering edit mode

-F 2304 excludes unmapped reads, and datamash is a command line program that makes it easier to perform actions (like column medians) from data in tabular format.

0
Entering edit mode

Great. Is it possible to calculate median length of total reads so that I can take account for unmapped + mapped?

1
Entering edit mode

Just remove that argument and run the same command.

0
Entering edit mode

No. It excludes supplementary and secondary reads.

0
Entering edit mode

This is what I get for being lazy and thinking I remember what it means instead of looking it up. Thanks for the correction.

0
Entering edit mode

By looking at this discussion, it looks the below command gives the median read lenght of total reads. Can you confirm this, please?

samtools view in.bam | awk '{print length($10);}' | datamash median 1  ADD REPLY 1 Entering edit mode The referenced command will output ALL reads in the BAM file, because there is no selection being applied. Supplementary reads are described here, and you might be interested in getting familiar with BAM flags by reading the specification or playing with a decoding utility. ADD REPLY 1 Entering edit mode 3 months ago seidel 11k If you have littler installed, you could try that: samtools view mutant_ip.bam | head -1000 | awk '{print length($10)}' | r -de 'print(summary(X[,1]))'
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
76      76      76      76      76      76


(you would use the median() function in place of summary())

0
Entering edit mode

Thanks a lot. Is this mapped or unmapped? Is it possible to calculate median length of total reads so that I can take account for unmapped + mapped?