How can I filter SNP for 30 samples in vcf files?
1
0
Entering edit mode
8.7 years ago
sun.nation ▴ 140

I have 30 samples to call SNPs against same reference. All samples were aligned to reference using bowtie2 to get separate sam files. Those were used in calling.

What is the best way to call SNP and filter:

Call SNP together like:

samtools mpileup -uf genome.fa sam1.bam sam2.bam ....sam30.bam |  bcftools view -I -bvcg -> all30.bcf
bcftools view all30.bcf | vcfutils.pl varFilter -d 10  > all30.vcf

I found that even without any read coverage vcf file had genotype exactly as in reference if one of the sample had SNP in that particular location.

If there is SNP, I want coverage in all samples or maybe 1 missing.

29 samples should have coverage of 10 and if Het: minor allele frequency of 30% (i.e 3).

Thanks in advance

vcf snp filtering • 3.6k views
ADD COMMENT
1
Entering edit mode

So you have already got the bam file of the 30 samples, right?

Could you please check if the all the bam file has a unique read group id? If they have the same read group, then all the bam file will be considered as the same sample, therefore leading to your situation. You can use the picard tools to change the read group, see if that will help.

ADD REPLY
0
Entering edit mode

I did alignment of each samples separately to reference to get separate sam files and used downstream. Sorry, how do I check that.

This is the vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  E1-sorted.bam   E2-sorted.bam   E3-sorted.bam   E4-sorted.bam   E5-sorted.bam -E6-sorted.bam   E7-sorted.bam   E8-sorted.bam   E9-sorted.bam   E10-sorted.bam  K2-sorted.bam   K3-sorted.bam   K4-sorted.bam   K5-sorted.bam   K6-sorted.bam  K7-sorted.bam   K8-sorted.bam   K9-sorted.bam   K11-sorted.bam  K12-sorted.bam  K13-sorted.bam  K14-sorted.bam  K15-sorted.bam  K16-sorted.bam-K17-sorted.bam  K18-sorted.bam  K19-sorted.bam  K20-sorted.bam  K21-sorted.bam  K22-sorted.bam  K23-sorted.bam  K24-sorted.bam  K25-sorted.bam
PcapLG_01       12221   .       T       A       7.08    .       DP=139;VDB=2.024855e-01;RPB=-9.735662e-01;AF1=0.2464;G3=0.784,0.00789,0.2082;HWE=0.0036
9;AC1=16;DP4=48,45,21,25;MQ=23;FQ=8.05;PV4=0.59,1,2e-32,0.049   GT:PL:GQ        0/0:0,18,134:21 0/0:0,21,133:24 0/1:12,6,0:3    0/1:6,0,26:5    0/1:9,0,34:7   0/0:0,1,65:6    0/0:0,27,179:30 0/0:0,6,30:10   0/0:0,0,0:5     0/0:0,18,134:21 0/0:4,3,0:4     0/0:11,11,11:5  0/0:0,21,166:24 1/1:36,30,0:19-0/0:0,0,0:5     0/0:0,0,0:5     0/0:0,0,0:5     0/0:0,0,0:5     0/0:2,2,2:5     1/1:25,15,0:6   0/0:0,0,0:5     0/0:0,6,45:10   1/1:22,12,0:4   0/0:0,24,175:27        0/0:0,24,176:27 0/0:0,0,0:5     0/0:0,51,244:54 0/0:0,0,0:5     0/0:0,0,0:5     0/0:0,3,30:7  0/0:0,5,184:9   0/0:6,6,6:5     0/0:17,17,17:5
ADD REPLY
1
Entering edit mode
8.7 years ago

See http://www.htslib.org/doc/samtools.html and add the option --output-tags DP to mpileup, you'll get the DEPTH for each sample.

ADD COMMENT
0
Entering edit mode

I tried --output-tags DP, -t DP but says command not found.

-D -S worked

Is this same? from manual:

  • Apply -D and -S to keep per-sample read depth and strand bias. This is preferred if there are more than one samples at high coverage.

After that how can I filter based on depth (remove sites with no depth in at least 28 samples) and allele frequency.

ADD REPLY
1
Entering edit mode

you're using an old version of samtools. current is 1.2 https://github.com/samtools/samtools/releases

ADD REPLY
0
Entering edit mode

Thanks, I didn't know

ADD REPLY
0
Entering edit mode

for filtering, see for example a previous answer of a related problem: A: Gatk Multi-Sample Vcf Variantfiltration

ADD REPLY

Login before adding your answer.

Traffic: 2603 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6