Human genome hg19 variant calling with bcftools
1
0
Entering edit mode
2.8 years ago

I am trying to convert my sorted .bam file to .bcf, using the following command:

samtools mpileup -f hg19.fa sequence1.sorted.bam > X.bcf

But the result was:

1 samples in 1 input files

chr15   5062469 N   0   *   *
chr15   5062470 N   0   *   *
chr15   5062471 N   0   *   *
chr15   5062472 N   0   *   *
chr15   5062473 N   0   *   *
chr15   5062474 N   0   *   *
chr15   5062475 N   0   *   *
chr15   5062476 N   0   *   *
chr15   5062477 N   0   *   *

as a .bcf file.

I ran the following command:

samtools view -H sequence1.sorted.bam

to view .bam listings, and later:

grep ">" hg19.fa

to view reference genome data.

The following, respectively, shows the output.

@HD     VN:1.6  SO:coordinate
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr17_ctg5_hap1      LN:1680828
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr22        LN:51304566
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr4_ctg9_hap1       LN:590426
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr6_apd_hap1        LN:4622290
@SQ     SN:chr6_cox_hap2        LN:4795371
@SQ     SN:chr6_dbb_hap3        LN:4610396
@SQ     SN:chr6_mann_hap4       LN:4683263
@SQ     SN:chr6_mcf_hap5        LN:4833398
@SQ     SN:chr6_qbl_hap6        LN:4611984
@SQ     SN:chr6_ssto_hap7       LN:4928567
@SQ     SN:chr7 LN:159138663
@SQ     SN:chr8 LN:146364022
@SQ     SN:chr9 LN:141213431
@SQ     SN:chrM LN:16571
@SQ     SN:chrUn_gl000211       LN:166566
@SQ     SN:chrUn_gl000212       LN:186858
@SQ     SN:chrUn_gl000213       LN:164239
@SQ     SN:chrUn_gl000214       LN:137718
@SQ     SN:chrUn_gl000215       LN:172545
@SQ     SN:chrUn_gl000216       LN:172294
@SQ     SN:chrUn_gl000217       LN:172149
@SQ     SN:chrUn_gl000218       LN:161147
@SQ     SN:chrUn_gl000219       LN:179198
@SQ     SN:chrUn_gl000220       LN:161802
@SQ     SN:chrUn_gl000221       LN:155397
@SQ     SN:chrUn_gl000222       LN:186861
@SQ     SN:chrUn_gl000223       LN:180455
@SQ     SN:chrUn_gl000224       LN:179693
@SQ     SN:chrUn_gl000225       LN:211173
@SQ     SN:chrUn_gl000226       LN:15008
@SQ     SN:chrUn_gl000227       LN:128374
@SQ     SN:chrUn_gl000228       LN:129120
@SQ     SN:chrUn_gl000229       LN:19913
@SQ     SN:chrUn_gl000230       LN:43691
@SQ     SN:chrUn_gl000231       LN:27386
@SQ     SN:chrUn_gl000232       LN:40652
@SQ     SN:chrUn_gl000233       LN:45941
@SQ     SN:chrUn_gl000234       LN:40531
@SQ     SN:chrUn_gl000235       LN:34474
@SQ     SN:chrUn_gl000236       LN:41934
@SQ     SN:chrUn_gl000237       LN:45867
@SQ     SN:chrUn_gl000238       LN:39939
@SQ     SN:chrUn_gl000239       LN:33824
@SQ     SN:chrUn_gl000240       LN:41933
@SQ     SN:chrUn_gl000241       LN:42152
@SQ     SN:chrUn_gl000242       LN:43523
@SQ     SN:chrUn_gl000243       LN:43341
@SQ     SN:chrUn_gl000244       LN:39929
@SQ     SN:chrUn_gl000245       LN:36651
@SQ     SN:chrUn_gl000246       LN:38154
@SQ     SN:chrUn_gl000247       LN:36422
@SQ     SN:chrUn_gl000248       LN:39786
@SQ     SN:chrUn_gl000249       LN:38502
@SQ     SN:chrX LN:155270560
@SQ     SN:chrY LN:59373566
@PG     ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa samse hg19bwaidx sequence1.bwa SRR9598696.fastq
@PG     ID:samtools     PN:samtools     PP:bwa  VN:1.12 CL:samtools view -bS sequence1_se.sam
@PG     ID:samtools.1   PN:samtools     PP:samtools     VN:1.12 CL:samtools sort -O bam -o sequence1.sorted.bam -T temp sequence1.bam
@PG     ID:samtools.2   PN:samtools     PP:samtools.1   VN:1.12 CL:samtools view -H sequence1.sorted.bam

and

>chr1
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr17_ctg5_hap1
>chr18
>chr19
>chr2
>chr20
>chr21
>chr22
>chr3
>chr4
>chr4_ctg9_hap1
>chr5
>chr6
>chr6_apd_hap1
>chr6_cox_hap2
>chr6_dbb_hap3
>chr6_mann_hap4
>chr6_mcf_hap5
>chr6_qbl_hap6
>chr6_ssto_hap7
>chr7
>chr8
>chr9
>chrM
>chrUn_gl000211
>chrUn_gl000212
>chrUn_gl000213
>chrUn_gl000214
>chrUn_gl000215
>chrUn_gl000216
>chrUn_gl000217
>chrUn_gl000218
>chrUn_gl000219
>chrUn_gl000220
>chrUn_gl000221
>chrUn_gl000222
>chrUn_gl000223
>chrUn_gl000224
>chrUn_gl000225
>chrUn_gl000226
>chrUn_gl000227
>chrUn_gl000228
>chrUn_gl000229
>chrUn_gl000230
>chrUn_gl000231
>chrUn_gl000232
>chrUn_gl000233
>chrUn_gl000234
>chrUn_gl000235
>chrUn_gl000236
>chrUn_gl000237
>chrUn_gl000238
>chrUn_gl000239
>chrUn_gl000240
>chrUn_gl000241
>chrUn_gl000242
>chrUn_gl000243
>chrUn_gl000244
>chrUn_gl000245
>chrUn_gl000246
>chrUn_gl000247
>chrUn_gl000248
>chrUn_gl000249
>chrX
>chrY

So, why does it fail to generate the .bcf file?

bcftools samtools variant-calling • 1.5k views
ADD COMMENT
1
Entering edit mode
2.8 years ago

Hi,

With samtools mpileup, you are merely getting a multiway pileup.

If you wish to call variants with BCFtools, then please try:

Ref_FASTA="hg19.fa" ;
bcftools mpileup \
  --redo-BAQ \
  --min-BQ 30 \
  --per-sample-mF \
  --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR \
  -f "${Ref_FASTA}" \        
  Aligned_Sorted_PCRDuped_FiltMAPQ.bam \
  Aligned_Sorted_PCRDuped_FiltMAPQ_75pcReads.bam \
  Aligned_Sorted_PCRDuped_FiltMAPQ_50pcReads.bam \
  Aligned_Sorted_PCRDuped_FiltMAPQ_25pcReads.bam | \
bcftools call \
  --multiallelic-caller \
  --variants-only \
  -Ob \
  > Aligned_Sorted_PCRDuped_FiltMAPQ.bcf ;
bcftools index Aligned_Sorted_PCRDuped_FiltMAPQ.bcf ;

, i.e., we 'pipe' the output of bcftools mpileup into bcftools call

See this part of my own historic 'clinical grade' pipeline (deprecated):

Kevin

ADD COMMENT
0
Entering edit mode

Thank you kevin for the response. I am quite a novice in Bioinformatics; can you tell why the reference genome, "hg19.fa" needs to have .fa files like chr4_ctg9_hap1,chrUn_gl000215 >chrUn_gl000216,>chrUn_gl000217,>chrUn_gl000218,>chrUn_gl000219 etc. instead of just the ChrN.fa(N=1 -22,x,y,M)

ADD REPLY
1
Entering edit mode

Hi, those are not primary contigs (for now, let's just think of a 'contig' as being a chromosome); thus, you can ignore them. The primary contigs will be named:

chr1
chr2
chr3
... ...
chr22
chrX
chrY

For more information on the other contigs, please follow the information from here: Additional Data In Human Genome (Hg18 / Hg19) Assembly ?

ADD REPLY
0
Entering edit mode

Hi kevin,

I tried variant calling using the code you suggested. But it failed with an error in the 'mpileup'. the result has been attached.

enter image description here

later, I tried variant calling using 'samtools mpileup', using the following code.

samtools mpileup -g -f hg19.fa sequence1.sorted.bam | bcftools call -mv -Oz -o calls.vcf.gz

It generated a .vcf file.

enter image description here

Is this a valid result? Did I follow your code correctly?

ADD REPLY
0
Entering edit mode

If this command ran successfully without error, then, yes, the VCF should be valid:

samtools mpileup -g -f hg19.fa sequence1.sorted.bam | bcftools call -mv -Oz -o calls.vcf.gz

My command should also have run, however, note that you will not have the BAM files that I specify in my code, i.e., you need to replace this:

-f "${Ref_FASTA}" \        
Aligned_Sorted_PCRDuped_FiltMAPQ.bam \
Aligned_Sorted_PCRDuped_FiltMAPQ_75pcReads.bam \
Aligned_Sorted_PCRDuped_FiltMAPQ_50pcReads.bam \
Aligned_Sorted_PCRDuped_FiltMAPQ_25pcReads.bam | \

...with this:

-f hg19.fa \
sequence1.sorted.bam | \

Can you try my command again, please? If necessary, please run it all on a single line without the forward slashes (/)

ADD REPLY
0
Entering edit mode

Hi Kevin,

I have correctly modified your command, and that resulted in a .bcf file. But I don't think it is a valid .bcf file. I have attached both the "modified command" and the snap of the resulting .bcf file. Kindly, have a look; I hope the command has run. Could it be due to issues lurking in the reference .fa file?

enter image description here

enter image description here

ADD REPLY
0
Entering edit mode

Hi, the -Ob parameter instruct BCFtools to output in binary VCF format. This can help to save disk space and improve random access of records in the BCF.

What you need to do to view it is:

bcftools view myvariants.bcf ;

Please take a look at the output options for bcftools mpileup by simply running:

bcftools mpileup
ADD REPLY

Login before adding your answer.

Traffic: 2931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6