convert single bam file to VCF format
1
0
Entering edit mode
14 months ago
Human • 0

Hey humans,

My data consists of test data for practice. It consists of like 200 artificial reads with fake indels I need to find. In the end i will get real reads of thousands of individuals which differ in terms of one loci, depending if they are male or female. to get into the necessary scripting and develop an approach I asked for this train data. the reads show §fake" indels and should differ in some bp's in a guess.... I created this bam file by using slurm array IDs. later on i need to assign alll reads to the individual they originate from. Assuming the steps I did before to create a bam file are correct, I dont know why I stuck in following step:

Slurm script:

module load SAMtools/1.3.1
module load BCFtools/1.3.1


samtools view -b -h -T example.fasta Test1.bam | bcftools call -vmO v -o Test1.vcf

Error:

Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid
Failed to open -: unknown file type

what would you suggest.

vcf bam bcftools • 1.7k views
ADD COMMENT
0
Entering edit mode

what would you suggest.

I'm puzzled. A parameter is said missing " --ploidy nor --ploidy-file given" . Did you try to set one of those values ?

ADD REPLY
0
Entering edit mode

thanks, how would I do that ?

ADD REPLY
0
Entering edit mode

Read ATPoint's comment and the page it points to. You cannot use just call, you need to mpileup then call.

ADD REPLY
0
Entering edit mode

aah ok, I knew that from previous analysis where I did like that. I was just confused because this time I just have just one bam file. So, I assumed the mpileup step is just a combination of two or more bam files and can be left out in this situation. Thanks!

ADD REPLY
0
Entering edit mode

A nice blog post on mpileup: https://davetang.org/muse/2015/08/26/samtools-mpileup/

It's an intermediate step between alignment (BAM) and called variants (VCF). To call a variant, one must know the loci on the reference where a variant possibly exists, and I think that's what mpileup does.

ADD REPLY
0
Entering edit mode

I like to explain it simply by: (1) Row Order: the BAM being the original read segments of 100-150 base pairs from the sequencer. So row oriented as you see them in a Genome Browser.
(2) Column Order: mpileup gets you the column view of each base pair at a time. Giving you all the options and variances it sees there.
(3) Singular base-pair: Then the call takes all the information of various quality metrics and makes a determination of the likely "called" value (or values if diploid, for example) for one or a few base pairs (think inserts or deletes). So the call collapses the column to information about just that base pair; often only reporting on differences from the reference model as they are sparse.

So row order BAM, to column order (post Pileup), to single value (VCF).

ADD REPLY
0
Entering edit mode

convert

You cannot convert a BAM to a VCF - they are not equivalent in terms of information content or processing level. Does one "convert" wheat to bread or make bread from wheat by running it through a process that involves more ingredients?

ADD REPLY
0
Entering edit mode

Now with following script:

samtools view -bS Test1.sam | samtools sort - | bcftools mpileup -d 8000 -Ou - | bcftools call -mv -Ov > Test1.vcf

I get following Errors:

Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid
[E::main] unrecognized command 'mpileup'
Failed to open -: unknown file type

how can adjust the Ploidy options or what is a ploidy file....

ADD REPLY
0
Entering edit mode

You are missing the reference as part of the mpileup command. It is a required parameter. See https://samtools.github.io/bcftools/bcftools.html#mpileup . Also just running bcftools to get the usage shows an example at the end which indicates the reference is required.

The advice of @ATpoints (convert single bam file to VCF format) on reading https://samtools.github.io/bcftools/howtos/variant-calling.html as a basic start is a good one. Although it does not explain how to set the ploidy to remove the warning. Run bcftools call --ploidy ? to see the predefined parameters to ploidy (most complex ones defined for the Human Genome; but you can supply your own file via --ploidy-file (the bcftools usage also gives some indication of all this)

Personally, I would do the sort and save it. Once sorted, you can index the file. Then you can supply the BAM/SAM directly to bcftools. Not sure about SAM as I always only use a sorted, indexed BAM. But you get the drift. Sorting and indexing is required by many tools. And, if not required, slows them down as it likely has to perform the function internally.

ADD REPLY
2
Entering edit mode
14 months ago
ATpoint 82k

Hello human,

please read the bcftools documentation. It covers how to run the variant calling steps.

https://samtools.github.io/bcftools/howtos/variant-calling.html

ADD COMMENT

Login before adding your answer.

Traffic: 2720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6