Visualizing exome Sequencing data for depth
1
1
Entering edit mode
9.1 years ago
ruhirai ▴ 20

Hi I have about 80 samples. I ran GATK on it to do DepthofCoverage on it and generated all the following files

Coverage_summary
Coverage_summary.sample_cumulative_coverage_counts
Coverage_summary.sample_cumulative_coverage_proportions
Coverage_summary.sample_interval_statistics
Coverage_summary.sample_interval_summary
Coverage_summary.sample_statistics
Coverage_summary.sample_summary

I want to create a table with all samples showing Sample ID, targeted regions, depth for forward read, Depth for reverse read, Median Depth for that target

And then I want to be able to plot this data showing per sample median depth per targeted region.

Any suggestion is appreciated

Thank you

sequencing • 2.2k views
ADD COMMENT
1
Entering edit mode
5.8 years ago
Gungor Budak ▴ 270

Hi!

You can do this with samtools depth easily.

samtools depth can work with multiple inputs and it generates such a file with the below command:

Command (please see official docs for arguments, you will probably need to provide forward and reverse reads from BAMs separately)

samtools depth -r 1 -m 100 -Q 20 -q 10 -a bam1 bam2 bam3

Output

1   55475000    38  49  58
1   55475001    37  50  57
1   55475002    36  51  55
1   55475003    38  52  55
1   55475004    36  51  56
1   55475005    35  51  55
1   55475006    35  51  54
1   55475007    35  51  55
1   55475008    35  54  51
1   55475009    35  53  56
1   55475010    34  53  53
1   55475011    35  53  56
1   55475012    36  53  55
1   55475013    35  54  56
1   55475014    36  55  59
1   55475015    35  55  58
1   55475016    34  49  54
1   55475017    34  53  55
1   55475018    35  54  54
1   55475019    34  50  54
1   55475020    35  54  56
1   55475021    36  54  59
1   55475022    35  55  60
1   55475023    34  55  59
1   55475024    35  51  58
1   55475025    34  51  59
1   55475026    34  50  60
1   55475027    33  49  57
1   55475028    33  50  58
1   55475029    33  50  58
1   55475030    32  51  57
1   55475031    31  51  59
1   55475032    29  50  59
1   55475033    28  49  55
1   55475034    28  49  55

Finally, use a Python script for example to read every position collect per sample metrics and calculate mean, median, etc. Following is a sample code for my previous analysis (for a similar purpose, not exactly the same).

import sys
import numpy as np

print('\t'.join([
    '#chrom', 'pos', 'mean', 'median',
    '1', '5', '10', '15', '20', '25',
    '30', '50', '100']))
with open(sys.argv[1]) as rows:
    for row in rows:
        row = row.strip().split()
        print('\t'.join(row[0:2]), end='\t')
        covs = np.array(list(map(int, row[2:])))
        two_decimals = '%.2f'
        four_decimals = '%.4f'
        above_1 = np.size(np.where(covs >= 1)) / np.size(covs)
        above_5 = np.size(np.where(covs >= 5)) / np.size(covs)
        above_10 = np.size(np.where(covs >= 10)) / np.size(covs)
        above_15 = np.size(np.where(covs >= 15)) / np.size(covs)
        above_20 = np.size(np.where(covs >= 20)) / np.size(covs)
        above_25 = np.size(np.where(covs >= 25)) / np.size(covs)
        above_30 = np.size(np.where(covs >= 30)) / np.size(covs)
        above_50 = np.size(np.where(covs >= 50)) / np.size(covs)
        above_100 = np.size(np.where(covs >= 100)) / np.size(covs)
        data = [
            two_decimals % np.mean(covs),
            two_decimals % np.median(covs),
            four_decimals % above_1,
            four_decimals % above_5,
            four_decimals % above_10,
            four_decimals % above_15,
            four_decimals % above_20,
            four_decimals % above_25,
            four_decimals % above_30,
            four_decimals % above_50,
            four_decimals % above_100
        ]
        print('\t'.join(data), end='\n')
ADD COMMENT

Login before adding your answer.

Traffic: 1524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6