Question: How to read VCF (v4.1) file? (student project) (samtools version 0.1.19)
0
gravatar for c.clarido
5 weeks ago by
c.clarido40
Netherlands/Rotterdam/Leiden University (Applied Science)
c.clarido40 wrote:

Hello community,

Below is a fragment of my VCF file:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  aln.sorted.bam
11561   6   .   C   X   0   .   DP=1;I16=1,0,0,0,38,1444,0,0,6,36,0,0,0,0,0,0;QS=1.000000,0.000000,0.000000,0.000000    PL  0,3,6
11561   7   .   C   X   0   .   DP=1;I16=1,0,0,0,38,1444,0,0,6,36,0,0,1,1,0,0;QS=1.000000,0.000000,0.000000,0.000000    PL  0,3,6
11561   8   .   A   X   0   .   DP=1;I16=1,0,0,0,38,1444,0,0,6,36,0,0,2,4,0,0;QS=1.000000,0.000000,0.000000,0.000000    PL  0,3,6
11561   19  .   G   A,X 0   .   DP=2;I16=1,0,1,0,38,1444,38,1444,6,36,6,36,7,49,13,169;QS=0.500000,0.500000,0.000000,0.000000;RPB=9.668049e-01  PL  1,0,1,4,4,4
11561   337 .   ca  cAa 0   .   INDEL;IS=1,0.200000;DP=5;I16=1,1,1,0,68,2312,40,1600,2,2,15,225,35,697,6,36;QS=0.347826,0.652174,0.000000,0.000000;VDB=6.365454e-02 PL  6,0,0
11561   375 .   taaaaaaaa   taaaaaaa    0   .   INDEL;IS=1,0.200000;DP=5;I16=0,0,1,0,0,0,18,324,0,0,6,36,0,0,25,625;QS=0.000000,1.000000,0.000000,0.000000  PL  6,3,0
11561   390 .   A   T,G,X   0   .   DP=4;I16=0,0,2,1,0,0,114,4332,0,0,8,38,0,0,75,1875;QS=0.000000,0.571429,0.428571,0.000000;VDB=5.227153e-02  PL  5,6,0,5,2,2,5,6,5,5
11561   740 .   AC  ACC 0   .   INDEL;IS=4,0.102564;DP=39;I16=20,7,2,1,741,22083,49,819,457,12337,5,13,524,11426,48,866;QS=0.970732,0.029268,0.000000,0.000000;VDB=5.891593e-02 PL  0,54,149

I need to find the SNPs and INDELs, however I don't quiet understand the X under the ALT. Does it mean X-chomosome? Does the following mean that there is a SNP from G to A on chromosome X? How do I know that significance of this result? Is it the QS=0.500000 or the QUAL=0? Is there anything I need to know when collecting the SNPs and INDELs?

11561   19  .   G   A,X 0   .   DP=2;I16=1,0,1,0,38,1444,38,1444,6,36,6,36,7,49,13,169;QS=0.500000,0.500000,0.000000,0.000000;RPB=9.668049e-01  PL  1,0,1,4,4,4

Thank you in advance!

variantcalling samtools bcf vcf • 179 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by c.clarido40
1

Hello,

the fragment you are showing is not a vcf. It is the output of samtools/bcftools mpileup. You need to do a bcftools callon it. See Devon's answer here.

fin swimmer

ADD REPLYlink written 5 weeks ago by finswimmer7.0k

Hello thank you for the link, however I did the following:

samtools mpileup -f reference.fa aln.sorted.bam > aln.mpileup 
samtools mpileup -uf reference.fa aln.sorted.bam > calls.bcf
bcftools view calls.bcf > call.vcf

I am using samtools version 0.1.19. I thought mpileup should look like this:

11561   49  a   8   ........    GGGGGG;G
11561   50  t   8   ........    GGGGFGGG
11561   51  g   8   ........    GGGGGGGG
11561   52  a   8   ........    GGGEGGGG
11561   53  g   8   ........    GGGGGGGG
11561   54  g   8   ........    GGGGGGGG
11561   55  a   9   ........^'. GGGGGGGGA
11561   56  a   9   .........   GGGGGGGGB
11561   57  a   9   .........   GGGGGGGGE
11561   58  c   9   .........   GGGEGGGGG
11561   59  c   9   ....G....   GGGEGGGGG
11561   60  g   10  .........^'.    GGGGGGGGG?
11561   61  g   10  ..........  GGGGGGGGGA
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by c.clarido40

Hello cedrickclarido ,

samtools 0.1.19 is very, very old. Current version is 1.9. I strongly recommend to update. (The easiest way is to use conda.) In the current version samtools mpileup is deprecated and replaced by bcftools mpileup.

Beside this: In your code you never use aln.mpileup. Instead you do a second mpileup and forced the output to be a uncompressed bcf format. In the newer versions this is the standard (I believe the old output format your are showing isn't used anymore.) Your bcftools view command is then just converting the bcf version of the mpileup result to vcf. You are missing the bcftools call step.

fin swimmer

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by finswimmer7.0k

Hello, indeed the version is really old. However, as student I am limited to this version as I am working in a school's server where the tool is pre-installed. I can't seems to find any good manuals or documentations to convert pileup to bcf and bcf to vcf. As shown from my previous comment:

samtools mpileup -f reference.fa aln.sorted.bam > aln.mpileup 
samtools mpileup -uf reference.fa aln.sorted.bam > calls.bcf
bcftools view calls.bcf > call.vcf

I used the first code to generate a PILEUP-file. I found online that without using the flag -u or -g, the command would generate a PILEUP file, otherwise it would create a BCF. According to the manual of version 0.1.19, the bcf file can be converted by using view option, where the input is on default BCF and output is VCF. Did I perhaps misused the command? The project also assigned us to convert the PILEUP to bcf using bcftools, however I can't find any commands that fits the criteria other than reusing the samtools mpileup again but without the -u/-g flags.

Thank you.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by c.clarido40

You can always request for the latest software to be installed on the school server. In all probability, it's already installed and has a mechanism to be loaded (such as module load) or the admin will install it in a couple of minutes. Working with outdated software puts you at a significant disadvantage.

ADD REPLYlink written 5 weeks ago by RamRS19k

I think your problem is understanding what is mpileup for. I will try to explain it:

Using samtools/bcftools for variant discovering is a 2-step process.

In the first step - called mpileup - samtools will generate information about each single position of the reference sequence. This contains how many reads cover this site, if the base on the position matches the reference and what's the quality for each base.

Based on this information bcftools calldecide in a second step on which position it will have a closer look if there is a variant. To decide if there is a variant more critera than only the number of reads that support the variant and the base quality are needed. For example the mapping quality, the position within the read, compare those metrics between read that support the reference and those supporting the variant, ...

So in a very basic pipeline your command should look like this:

$ samtools mpileup -fu reference.fa aln.sorted.bam > aln.mpileup 
$ bcftools view -vcg  aln.mpileup >  call.vcf

In the current version of samtools/bcftools like this:

$ bcftools mpileup -Ou -f reference.fa aln.sorted.bam  | bcftools call -mv > call.vcf

fin swimmer

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by finswimmer7.0k
1

I'm not sure that's the right samtools command line. Don't you need -u? I'm pretty sure the regular eyeball-friendly pileup format is not what bcftools has ever wanted as input.

ADD REPLYlink written 5 weeks ago by swbarnes24.5k
1

Yes, you're right. Fixed it.

ADD REPLYlink written 5 weeks ago by finswimmer7.0k

It worked! thank you community!

ADD REPLYlink written 5 weeks ago by c.clarido40

Thank you the explanation sir, I tried the following and seems to get errors all the time:

s1104230@Bioserver02:~/test2$ time samtools mpileup -f reference.fa aln.sorted.bam > aln2.mpileup
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000

real    19m30.006s
user    19m21.028s
sys 0m7.384s
s1104230@Bioserver02:~/test2$ time bcftools -vcg  aln2.mpileup >  call2.vcf
[main] Unrecognized command.

real    0m0.001s
user    0m0.000s
sys 0m0.000s

s1104230@Bioserver02:~/test2$ time bcftools view -vcg  aln2.mpileup >  call2.vcf
[bcf_sync] incorrect number of fields (0 != 5) at 0:0
[afs] 0:0.000

real    0m0.014s
user    0m0.000s
sys 0m0.000s
ADD REPLYlink written 5 weeks ago by c.clarido40
1

Hello cedrickclarido ,

again: Update the version!

I edited my commands as I missed the view in the bcftoolscommand. And as swbarnes2 says the output of mpileup must be in a bcfformat.

fin swimmer

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by finswimmer7.0k

I don't think it is easy to help when we don't have access to the extensive bcftools documentation for the version you're using. There might be changes in arguments between the old bcftools view caller and the current bcftools call caller that cannot be easily mapped. Please upgrade to the latest version. I don't think you'll need any special permissions to compile it from source.

ADD REPLYlink written 5 weeks ago by RamRS19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1332 users visited in the last hour