extracting Allele Read Counts
1
0
Entering edit mode
7.1 years ago
Bogdan ★ 1.4k

Dear all,

please could you advise : given a (tumor) BAM file and a (germline) VCF file, what tool shall i use in order to extract the Allele Read Counts for each heterozygous SNP (from the vcf file) ?

many thanks,

-- bogdan

SNP Titan CNV • 3.6k views
ADD COMMENT
1
Entering edit mode

Depending on the variant caller you are using. For some of them you will see the following two fields under FORMAT column:

##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
ADD REPLY
0
Entering edit mode

The easiest way to do this is to extract the sites using bcftools view -v snps -g het. You can add -m2 -M2 if you are only interested in biallelic site. Then you can use allelecounter.py (https://github.com/secastel/allelecounter) to extract the read counts at ref and alt allele.

ADD REPLY
0
Entering edit mode

thank you for your suggestions ;)

ADD REPLY
3
Entering edit mode
7.1 years ago

a combination of some programs I've written.

  • Extract the heterozygous sites with vcffilterjs.
  • use awk to extract the 'CHROM:POS' of those site.
  • for each position , call FindAllCoverageAtPosition with the specified bam
  • sort + uniq to get one uniq header.
    java -jar jvarkit-git/dist/vcffilterjs.jar -e 'variant.getGenotype(0).isHet()' input.vcf | grep -v "^#" |\
awk '{printf("%s:%d\n",$1,$2);}' | while read  S; do echo "input.bam" | java -jar /jvarkit-git/dist/findallcoverageatposition.jar -p "${S}" ; done | \
LC_ALL=C sort | uniq | column -t


#File   CHROM      POS  SAMPLE  DEPTH  M    I   D  N  S   H  P  EQ  X  Base(A)  Base(C)  Base(G)  Base(T)  Base(N)  Base(^)  Base(-)
S1.bam  rotavirus  130  S1      328    328  0   0  0  52  0  0  0   0  33       0        0        295      0        0        0
S1.bam  rotavirus  232  S1      363    363  35  0  0  56  0  0  0   0  25       0        0        338      0        35       0
S1.bam  rotavirus  267  S1      315    315  1   1  0  44  0  0  0   0  0        296      19       0        0        1        1
S1.bam  rotavirus  961  S1      327    327  0   0  0  45  0  0  0   0  0        0        29       298      0        0        0

(later) pushed a fresh version on github a few minutes ago. there was a problem in the old findallcoverageatposition (all files were 'cram')

ADD COMMENT
0
Entering edit mode

thank you Pierre for your suggestions.

ADD REPLY

Login before adding your answer.

Traffic: 3019 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6