Question: extracting Allele Read Counts
0
gravatar for Bogdan
3.2 years ago by
Bogdan1000
Palo Alto, CA, USA
Bogdan1000 wrote:

Dear all,

please could you advise : given a (tumor) BAM file and a (germline) VCF file, what tool shall i use in order to extract the Allele Read Counts for each heterozygous SNP (from the vcf file) ?

many thanks,

-- bogdan

titan cnv snp • 2.0k views
ADD COMMENTlink modified 3.2 years ago by Pierre Lindenbaum128k • written 3.2 years ago by Bogdan1000
1

Depending on the variant caller you are using. For some of them you will see the following two fields under FORMAT column:

##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by iraun3.7k

The easiest way to do this is to extract the sites using bcftools view -v snps -g het. You can add -m2 -M2 if you are only interested in biallelic site. Then you can use allelecounter.py (https://github.com/secastel/allelecounter) to extract the read counts at ref and alt allele.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by prasundutta87360

thank you for your suggestions ;)

ADD REPLYlink written 2.8 years ago by Bogdan1000
3
gravatar for Pierre Lindenbaum
3.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

a combination of some programs I've written.

  • Extract the heterozygous sites with vcffilterjs.
  • use awk to extract the 'CHROM:POS' of those site.
  • for each position , call FindAllCoverageAtPosition with the specified bam
  • sort + uniq to get one uniq header.
    java -jar jvarkit-git/dist/vcffilterjs.jar -e 'variant.getGenotype(0).isHet()' input.vcf | grep -v "^#" |\
awk '{printf("%s:%d\n",$1,$2);}' | while read  S; do echo "input.bam" | java -jar /jvarkit-git/dist/findallcoverageatposition.jar -p "${S}" ; done | \
LC_ALL=C sort | uniq | column -t


#File   CHROM      POS  SAMPLE  DEPTH  M    I   D  N  S   H  P  EQ  X  Base(A)  Base(C)  Base(G)  Base(T)  Base(N)  Base(^)  Base(-)
S1.bam  rotavirus  130  S1      328    328  0   0  0  52  0  0  0   0  33       0        0        295      0        0        0
S1.bam  rotavirus  232  S1      363    363  35  0  0  56  0  0  0   0  25       0        0        338      0        35       0
S1.bam  rotavirus  267  S1      315    315  1   1  0  44  0  0  0   0  0        296      19       0        0        1        1
S1.bam  rotavirus  961  S1      327    327  0   0  0  45  0  0  0   0  0        0        29       298      0        0        0

(later) pushed a fresh version on github a few minutes ago. there was a problem in the old findallcoverageatposition (all files were 'cram')

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Pierre Lindenbaum128k

thank you Pierre for your suggestions.

ADD REPLYlink written 3.2 years ago by Bogdan1000
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1368 users visited in the last hour