Question: How can I filter SNP for 30 samples in vcf files?
0
gravatar for sun.nation
5.3 years ago by
sun.nation120
United States
sun.nation120 wrote:

I have 30 samples to call SNPs against same reference. All samples were aligned to reference using bowtie2 to get separate sam files. Those were used in calling.

What is the best way to call SNP and filter:

-- Call SNP together like:

samtools mpileup -uf genome.fa sam1.bam sam2.bam ....sam30.bam |  bcftools view -I -bvcg -> all30.bcf

bcftools view all30.bcf | vcfutils.pl varFilter -d 10  > all30.vcf

I found that even without any read coverage vcf file had genotype exactly as in reference if one of the sample had SNP in that particular location.

If there is SNP, I want coverage in all samples or maybe 1 missing.

29 samples should have coverage of 10 and if Het: minor allele frequency of 30% (i.e 3).

Thanks in advace.

 

 

filtering snp vcf • 2.4k views
ADD COMMENTlink modified 5.3 years ago by Pierre Lindenbaum131k • written 5.3 years ago by sun.nation120
1

So you have already got the bam file of the 30 samples, right?

Could you please check if the all the bam file has a unique read group id? If they have the same read group, then all the bam file will be considered as the same sample, therefore leading to your situation. You can use the picard tools to change the read group, see if that will help.

ADD REPLYlink written 5.3 years ago by Sam3.3k

I did alignment of each samples separately to reference to get separate sam files and used downstream. Sorry, how do I check that.

This is the vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  E1-sorted.bam   E2-sorted.bam   E3-sorted.bam   E4-sorted.bam   E5-sorted.bam -E6-sorted.bam   E7-sorted.bam   E8-sorted.bam   E9-sorted.bam   E10-sorted.bam  K2-sorted.bam   K3-sorted.bam   K4-sorted.bam   K5-sorted.bam   K6-sorted.bam  K7-sorted.bam   K8-sorted.bam   K9-sorted.bam   K11-sorted.bam  K12-sorted.bam  K13-sorted.bam  K14-sorted.bam  K15-sorted.bam  K16-sorted.bam-K17-sorted.bam  K18-sorted.bam  K19-sorted.bam  K20-sorted.bam  K21-sorted.bam  K22-sorted.bam  K23-sorted.bam  K24-sorted.bam  K25-sorted.bam
PcapLG_01       12221   .       T       A       7.08    .       DP=139;VDB=2.024855e-01;RPB=-9.735662e-01;AF1=0.2464;G3=0.784,0.00789,0.2082;HWE=0.0036
9;AC1=16;DP4=48,45,21,25;MQ=23;FQ=8.05;PV4=0.59,1,2e-32,0.049   GT:PL:GQ        0/0:0,18,134:21 0/0:0,21,133:24 0/1:12,6,0:3    0/1:6,0,26:5    0/1:9,0,34:7   0/0:0,1,65:6    0/0:0,27,179:30 0/0:0,6,30:10   0/0:0,0,0:5     0/0:0,18,134:21 0/0:4,3,0:4     0/0:11,11,11:5  0/0:0,21,166:24 1/1:36,30,0:19-0/0:0,0,0:5     0/0:0,0,0:5     0/0:0,0,0:5     0/0:0,0,0:5     0/0:2,2,2:5     1/1:25,15,0:6   0/0:0,0,0:5     0/0:0,6,45:10   1/1:22,12,0:4   0/0:0,24,175:27        0/0:0,24,176:27 0/0:0,0,0:5     0/0:0,51,244:54 0/0:0,0,0:5     0/0:0,0,0:5     0/0:0,3,30:7  0/0:0,5,184:9   0/0:6,6,6:5     0/0:17,17,17:5

 

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by sun.nation120
1
gravatar for Pierre Lindenbaum
5.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

see http://www.htslib.org/doc/samtools.html and add the option

--output-tags DP

 

to mpileup, you'll get the DEPTH for each sample.

ADD COMMENTlink written 5.3 years ago by Pierre Lindenbaum131k

I tried --output-tags DP, -t DP but says command not found.

-D -S worked

Is this same? from manual:

  • Apply -D and -S to keep per-sample read depth and strand bias. This is preferred if there are more than one samples at high coverage.

After that how can I filter based on depth (remove sites with no depth in at least 28 samples) and allele frequency.

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by sun.nation120
1

you're using an old version of samtools. current is 1.2 https://github.com/samtools/samtools/releases

ADD REPLYlink written 5.3 years ago by Pierre Lindenbaum131k

Thanks, I didn't know

ADD REPLYlink written 5.3 years ago by sun.nation120

for filtering, see for example a previous answer of a related problem: A: Gatk Multi-Sample Vcf Variantfiltration

ADD REPLYlink written 5.3 years ago by Pierre Lindenbaum131k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1317 users visited in the last hour