Question: Id And Ref Are Empty
gravatar for Zhshqzyc
9.6 years ago by
Zhshqzyc490 wrote:

Hi I run a samtools command to get a vcf file to find snp in a given area. But the result in vcf file doesn't make sense to me.

My command:

samtools mpileup -C50 -r chr21:start-end -Buf ref.fa 1.bam 2.bam 3.bam 4.bam 5.bam 6.bam 7.bam 8.bam 9.bam 10.bam 11.bam 12.bam 13.bam 14.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf  > var.flt.vcf

The last line in vcf is:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    1    2    3    4    5    5    7    8    9    10    11    12    13    14
chr21    41801273    .    A    T    39.4    .    DP=26;AF1=0.195;CI95=0.07143,0.3929;DP4=9,5,2,1;MQ=49;FQ=40.3;PV4=1,1,1.7e-12,1    GT:PL:GQ    0/0:0,0,0:5    0/0:0,12,101:16    0/1:27,3,0:7    0/1:65,6,0:5    0/0:0,6,59:10    0/0:0,9,71:13    0/0:0,0,0:5    0/0:0,3,29:7    0/0:0,6,51:10    0/0:0,6,53:10    0/0:0,0,0:5    0/0:0,0,0:5    0/0:0,0,0:5    0/0:0,0,0:5

Why ID column and REF column are nothing, I don't understand it. Anything wrong in command or data preparation? Thanks.

vcf samtools • 2.2k views
ADD COMMENTlink written 9.6 years ago by Zhshqzyc490

samtools is notorious for ignoring additional files - i don't think anything but 1.bam is being processed here

ADD REPLYlink written 9.6 years ago by Jeremy Leipzig19k
gravatar for Pierre Lindenbaum
9.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

in your example, REF is not empy. is an 'A'.

Samtools doesn't annotate the VCF file, that is too say that it will not scan a database to find the known SNPs at a given location.

To annotate a VCF, you could have a look at the ensembl variant effect predictor:

ADD COMMENTlink written 9.6 years ago by Pierre Lindenbaum134k
gravatar for Swbarnes2
9.6 years ago by
Swbarnes21.5k wrote:

Remember that vcf is a general format, and that samtools isn't the only way of making a vcf file. So there are lots of things that might go into a vcf file, but samtools doesn't necessarily know enough to put all that information in there.

In theory, you could sequence a bunch of humans, and find known SNPs with rs numbers, and those rs numbers would be appropriate to put in the ID column, but I don't think it's possible to show samtools a list of rs numbers, and expect it to put those in there. You'd either do that yourself, or find some software that would do it for you.

Its like the binary flags in your .bam file. Just because the binary flag can be set to indicate poor QC, or a duplicated read, doesn't mean that the software you used actually calculated that.

ADD COMMENTlink written 9.6 years ago by Swbarnes21.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1015 users visited in the last hour