Question: split sorted bam file chromosome wise
0
gravatar for evelyn
13 months ago by
evelyn110
evelyn110 wrote:

Dear all,

I wanted to split sorted bam file chromosome wise and put the header on each split file. I want to do the variant calling for multiple samples chromosome wise as I have a lot of samples which will take a long time to process altogether. I did that using SAM file earlier but I am not sure if sorted bam files can be used for such a job or not?

Thank you very much!

snp • 424 views
ADD COMMENTlink modified 13 months ago by inedraylig20 • written 13 months ago by evelyn110

yes, you can split a bam file also chomosome wise if that is the question.

ADD REPLYlink modified 13 months ago • written 13 months ago by lieven.sterck8.9k

You don't need to split the bam file to do the variant calling per chromosome.

ADD REPLYlink written 13 months ago by WouterDeCoster44k

Because of the multiple large size samples, it will take a long time for variant calling. That's why I want to split the files chromosome wise.

ADD REPLYlink written 13 months ago by evelyn110

You can do the variant calling separately per chromosome without splitting the bam. Bam allows random access. Which variant caller are you using?

ADD REPLYlink written 13 months ago by WouterDeCoster44k

I am using bcftools:

bcftools mpileup -Ov -f ref.fa file1.bam file2.bam file3.bam --- file100.bam | bcftools call -mv -o sample.vcf

Can it work for multiple bam files together for each chromosome? My whole point is to reduce computational time.

ADD REPLYlink written 13 months ago by evelyn110
1

Then look at bcftools call -r/-R

ADD REPLYlink written 13 months ago by WouterDeCoster44k

Thank you! I tried using:

bcftools mpileup -Ov -f ref.fa file1.bam file2.bam file3.bam | bcftools call -r ch01:0-100,000 -mv -o example.vcf

But I got an error:

Failed to open -: not compressed with bgzip

Then I tried:

bcftools mpileup -Ov -f ref.fa file1.bam.gz file2.bam.gz file3.bam.gz | bcftools call -r ch01:0-100,000 -mv -o example.vcf

Again I got an error:

[E::hts_hopen] Failed to open file file1.bam.gz
[E::hts_open_format] Failed to open file file1.bam.gz
[mpileup] failed to open file1.bam.gz: Exec format error

I am not sure which file format to use now. Thank you for your help!

ADD REPLYlink written 13 months ago by evelyn110

If you're worried about the variant calling taking a very long time, most variant callers (GATK/freebayes) can run on many threads, making the process faster. There are other ways to make variant calling faster, calling chromosome by chromosome is not standard.

ADD REPLYlink modified 13 months ago • written 13 months ago by inedraylig20

I am not sure about the standard ways to make variant calling faster for multiple samples with bcftools. Can you please share if you are aware of any such way? Thank you so much!

ADD REPLYlink written 13 months ago by evelyn110
2
gravatar for inedraylig
13 months ago by
inedraylig20
University of Vienna
inedraylig20 wrote:

You can use bamtools to split a bam file by chromosome, with

bamtools split -in file.bam -reference
ADD COMMENTlink modified 13 months ago by genomax92k • written 13 months ago by inedraylig20

Thank you, I will try that!

ADD REPLYlink written 13 months ago by evelyn110

you don't need to split per chromosome if you call with bcftools. See WouterDeCoster 's answer C: split sorted bam file chromosome wise

ADD REPLYlink written 13 months ago by Pierre Lindenbaum131k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2185 users visited in the last hour