Question: split sorted bam file chromosome wise
0
gravatar for evelyn
6 weeks ago by
evelyn80
evelyn80 wrote:

Dear all,

I wanted to split sorted bam file chromosome wise and put the header on each split file. I want to do the variant calling for multiple samples chromosome wise as I have a lot of samples which will take a long time to process altogether. I did that using SAM file earlier but I am not sure if sorted bam files can be used for such a job or not?

Thank you very much!

snp • 171 views
ADD COMMENTlink modified 6 weeks ago by inedraylig20 • written 6 weeks ago by evelyn80

yes, you can split a bam file also chomosome wise if that is the question.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by lieven.sterck6.4k

You don't need to split the bam file to do the variant calling per chromosome.

ADD REPLYlink written 6 weeks ago by WouterDeCoster42k

Because of the multiple large size samples, it will take a long time for variant calling. That's why I want to split the files chromosome wise.

ADD REPLYlink written 6 weeks ago by evelyn80

You can do the variant calling separately per chromosome without splitting the bam. Bam allows random access. Which variant caller are you using?

ADD REPLYlink written 6 weeks ago by WouterDeCoster42k

I am using bcftools:

bcftools mpileup -Ov -f ref.fa file1.bam file2.bam file3.bam --- file100.bam | bcftools call -mv -o sample.vcf

Can it work for multiple bam files together for each chromosome? My whole point is to reduce computational time.

ADD REPLYlink written 6 weeks ago by evelyn80
1

Then look at bcftools call -r/-R

ADD REPLYlink written 6 weeks ago by WouterDeCoster42k

Thank you! I tried using:

bcftools mpileup -Ov -f ref.fa file1.bam file2.bam file3.bam | bcftools call -r ch01:0-100,000 -mv -o example.vcf

But I got an error:

Failed to open -: not compressed with bgzip

Then I tried:

bcftools mpileup -Ov -f ref.fa file1.bam.gz file2.bam.gz file3.bam.gz | bcftools call -r ch01:0-100,000 -mv -o example.vcf

Again I got an error:

[E::hts_hopen] Failed to open file file1.bam.gz
[E::hts_open_format] Failed to open file file1.bam.gz
[mpileup] failed to open file1.bam.gz: Exec format error

I am not sure which file format to use now. Thank you for your help!

ADD REPLYlink written 6 weeks ago by evelyn80

If you're worried about the variant calling taking a very long time, most variant callers (GATK/freebayes) can run on many threads, making the process faster. There are other ways to make variant calling faster, calling chromosome by chromosome is not standard.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by inedraylig20

I am not sure about the standard ways to make variant calling faster for multiple samples with bcftools. Can you please share if you are aware of any such way? Thank you so much!

ADD REPLYlink written 6 weeks ago by evelyn80
2
gravatar for inedraylig
6 weeks ago by
inedraylig20
University of Vienna
inedraylig20 wrote:

You can use bamtools to split a bam file by chromosome, with

bamtools split -in file.bam -reference
ADD COMMENTlink modified 6 weeks ago by genomax75k • written 6 weeks ago by inedraylig20

Thank you, I will try that!

ADD REPLYlink written 6 weeks ago by evelyn80

you don't need to split per chromosome if you call with bcftools. See WouterDeCoster 's answer C: split sorted bam file chromosome wise

ADD REPLYlink written 6 weeks ago by Pierre Lindenbaum124k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1198 users visited in the last hour