Question

How To Retrieve Reads Supporting A Snp/Indel

1

Entering edit mode

11.5 years ago

Frewise ▴ 20

I used samtools and varscan for variant calling like this:

samtools mpileup \
  -f genome.fa \
  tku.sorted.rmdup.bam \
  | java -jar VarScan.v2.3.5.jar pileup2snp \
  --min-coverage 8 \
  --min-reads2 2 \
  --min-avg-qual 15 \
  --min-var-freq 0.01 \
  --p-value 0.001 \
  >tku.snp.vcf

now I want to retrieve all the reads supporting a specific SNP/indel and calculate the distribution of SNP/indel on the sequenced reads, is there any tools or proper way to achieve this?

reads snp indel varscan • 9.6k views

ADD COMMENT • link updated 11.3 years ago by rohan ▴ 110 • written 11.5 years ago by Frewise ▴ 20

0

Entering edit mode

I always first check what is actually happening with IGV (http://www.broadinstitute.org/igv/). Here you can nicely see the mismatches/SNPs with the reference genome and get a general ID if the SNP is real.

Later you can extract the reads with this command: samtools view BAMFILE.BAM chr1:1000000-2000000 > NEWFILE.bam

ADD REPLY • link 11.5 years ago by Floris Brenk ★ 1.0k

0

Entering edit mode

Thanks for your advice! But it is still difficult to acquire the coordinate of a SNP on reads, what's more, there are thousands of SNPs to process. So, an automatic workflow is prefered.

ADD REPLY • link 11.5 years ago by Frewise ▴ 20

0

Entering edit mode

i use python and pysam to read and manipulate reads but it requires some programming skills http://www.cgat.org/~andreas/documentation/pysam/api.html

ADD REPLY • link 11.5 years ago by arno.guille ▴ 420

score 3 · Answer 1 · 2014-01-16

3

Entering edit mode

11.5 years ago

Pierre Lindenbaum 166k

my tool SAM2Tsv https://github.com/lindenb/jvarkit/wiki/SAM2Tsv would print each position with the name of the read and the cigar operator, you can easily join the ouput to another file (a VCF...) to select the reads.

ADD COMMENT • link 11.5 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

Thanks for your tool, Pierre. It works fine!

ADD REPLY • link 11.5 years ago by Frewise ▴ 20

score 1 · Answer 2 · 2014-01-17

1

Entering edit mode

11.5 years ago

Andreas ★ 2.5k

This script might also help: it adds a special tag to reads supporting the reference or variant base: (requires a recent version of pysam). It doesn't support indels at the moment though.

Andreas

ADD COMMENT • link 11.5 years ago by Andreas ★ 2.5k

score 1 · Answer 3 · 2014-01-17

Hello! I have a script that solves a similar problem ( https://github.com/mikessh/mont/blob/master/src/molcount/MolCountSNP.groovy ) for SNPs. It calculates the number of reads that overlap with a given SNP and have a variant/reference or other nt in the position. It also estimates the number of "molecules" i.e. distinct reads according to offset. The results are in good agreement with e.g. TUMOR_F allele frequency estimated by MuTect. Anyways there are a parts of code that implement Picard Java API that could be of use. Hope you'll find it useful!

Ram · Answer 4 · 2014-03-25

0

Entering edit mode

11.3 years ago

rohan ▴ 110

I tried all of above for similar purpose

Biostar59647.. made a output xml of ~3GB.. that is very difficult to parse.. (run out of memory)

SAM2Tsv is better in that one need not parse xmls. but still the the problem is - identifying the mismatched position.. (the output only contains cigar codes Ms @ both matches and mismatches)

My question is that is there any alternate tool or a tool to get only the mismatches out of SAM2Tsv's .txt output?

@mikhail.shugay Can I use a custom vcf file containing only the snps of my interest in MolCountSNP.groovy ? eg. A->T @25th postion ..

ADD COMMENT • link 11.3 years ago by rohan ▴ 110

1

Entering edit mode

Biostar59647 should be parsed with a sax/stax parser, not a dom parser.

SAM2Tsv provides both REF and ALT columns.

ADD REPLY • link 11.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

i was actually using dom parser for Biostar59647. thanks for the suggestion.

i have a question.. may be i can ask here.. so..SAM2Tsv is very helpful in getting REF and ALT bases. but one more thing that would also be equally helpful is quality scores of the bases ...

is it possible to edit sam2tsv to get base qualities? there is a picard tool which gets base qualities.. may be we can get that output also through sam2tsv?

ADD REPLY • link 11.3 years ago by rohan ▴ 110

1

Entering edit mode

Yes, I can do this during the WE.

ADD REPLY • link 11.3 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

Done: https://github.com/lindenb/jvarkit/wiki/SAM2Tsv

ADD REPLY • link 11.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

you r just gr8 ! may you hv a gr8 "WE" too.. thanks a lot..

ADD REPLY • link 11.3 years ago by rohan ▴ 110

0

Entering edit mode

@Pierre Lindenbaum.. i have a question @ get amino acid changes from snp calls .. may be you can answer it

ADD REPLY • link 11.3 years ago by rohan ▴ 110

0

Entering edit mode

i dont know what "during the WE" means? im not geting it

ADD REPLY • link 11.3 years ago by rohan ▴ 110

1

Entering edit mode

during the week-end :-)

ADD REPLY • link 11.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

reg current version of sam2tsv (with quality scores)..

is it there also difference in snp calling?

because if I compare the outputs of both, the total number of lines is different.

eg. test1.sam.sam2tsv1 test1.sam.sam2tsv2 are the outputs of current and previous versions resp.

kclabws1@kclabws1:~/Documents/ngs/test/sam$ wc -l test1.sam.sam2tsv1
100401 test1.sam.sam2tsv1
kclabws1@kclabws1:~/Documents/ngs/test/sam$ wc -l test1.sam.sam2tsv2
99738 test1.sam.sam2tsv2

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by rohan ▴ 110