Question: Samtools variant caller for each individual sample
gravatar for bharata1803
2.6 years ago by
bharata1803430 wrote:

Hello, So, currently I want to get variation in the sequence for each individual sample (human cancer sample). I want to compare whether there are any variation among the cancer sample. What I mean is if I compare one cancer sample with others, I want to see if there are any different variation occur.

Can I use mpileup in samtools to call variation for a single sample and then compare the result after that?

I tried to use samtools for all samples in one go but it gives only one list of variation (VCF file). I think that VCF is the common variation occur in cancer sample.

Snippet of vcf result :

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by bharata1803430

Could you please explain your input and desired output with example snippets of files? Thank you

ADD REPLYlink written 2.6 years ago by Petr Ponomarenko2.6k

Well, basically the VCF file is what I needed. I just want to know the variation that an individual has. Let's say in chomosome X position N, individual A has SNP G with reference C. I want to compare if individual B,C,D also has that SNP or not. A,B,C,D are all cancer sample. It is really simple I think. I just want to know the whether using samtools mpileup will produce good result if only a single bam file is given. I think samtools and bcftools try to calculate some statistic based on the average across samples.

ADD REPLYlink written 2.6 years ago by bharata1803430

Depends on whether you are looking only for SNPs, what specificity and sensitivity you want, your ability to pay for software and for lab/chemistry optimization.

samtools on multiple bam files in order to make multisample vcf is a very good starting point to understand object you are working with.

ADD REPLYlink written 2.6 years ago by Petr Ponomarenko2.6k
gravatar for Petr Ponomarenko
2.6 years ago by
United States / Los Angeles /
Petr Ponomarenko2.6k wrote:

You should get one vcf file that has variation data of all samples. How many columns do you have in multisample vcf from variant calling on multiple samples? you should have many with fields containing genotype like 1/1 and 0/1etc. If you have only one such column, could you please tell us the command you used and samtools version.

ADD COMMENTlink written 2.6 years ago by Petr Ponomarenko2.6k

I can see the columns you are referring. Can that be used to get individual genotype?

ADD REPLYlink written 2.6 years ago by bharata1803430

sure Genotype field 1.6.2 page 9 and Header line syntax 1.5 page 7 each genotype column has name (ID) and it has mandatory GT field with genotype data of that individual (revered by name or ID)

So when you see

....... NA12878 NA12877 ....... 1/2:546 0/1:7657

1/2 corresponds to NA12878 and 0/1 - to sample NA12877. / means unphased diploid. | means phased (at least localy). 0 means reference allele. 1,2 and so on ar alternative alleles. See columns REF and ALT. If ALT=A,T and REF=C then 1/2=A/T and 0/1=C/A

There are some more about rules for REF and ALT and how they correspond to real alleles, but for SNPs you are good to go.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Petr Ponomarenko2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1946 users visited in the last hour