per sample mean coverage and standard deviation
1
0
Entering edit mode
2.9 years ago

Hello everyone

I have 18000 text files containing depth for each position in sample. each text file corresponds to 1 sample (total 18000 sample). I wanted to get the mean coverage, standard deviation and total count of positions per sample in a single output file. I was just wondering if there is an easy way to do it? depths were calculated using samtools depth input.bam. all the text files looks like this...

sample_name   chromosome   position     depth


so the desired output is..

sample1  mean_depth   standard_deviation   total_number_of_positions
sample2  mean_depth   standard_deviation   total_number_of_positions
sample1  mean_depth   standard_deviation   total_number_of_positions

bash awk • 1.1k views
0
Entering edit mode

Are the solutions in your last question not suitable : average depth across samples

0
Entering edit mode

this is a different question so I thought of asking it in a different post. there I wanted average depth per position across all the samples. here I need per sample average depth, sd and counts for total number of positions.

0
Entering edit mode

You can probably calculate that using some variation of datamash solution that was posted by @cpad0112 in last question. Tinkering with things is a great way to learn.

You should also validate answers for your past questions, if they helped you solve the issue (green check mark besides answers). You can accept more than one if they all work.

0
Entering edit mode

ok thank you so much for the information.

1
Entering edit mode
2.9 years ago
husensofteng ▴ 380

If I understand correctly, you want the overall quants across all positions per sample. In such case, assuming that the depth values are in the fourth column and all .txt files are in the directory:

for sample_file in *.txt; do
awk 'BEGIN{OFS="\t"}{x+=$4; y+=$4^2}END{print $1,x/NR,sqrt(y/NR-(x/NR)^2),NR}'$sample_file >> output.txt;
done