split sorted bam file chromosome wise
1
0
Entering edit mode
4.5 years ago
evelyn ▴ 230

Dear all,

I wanted to split sorted bam file chromosome wise and put the header on each split file. I want to do the variant calling for multiple samples chromosome wise as I have a lot of samples which will take a long time to process altogether. I did that using SAM file earlier but I am not sure if sorted bam files can be used for such a job or not?

Thank you very much!

snp • 2.5k views
ADD COMMENT
0
Entering edit mode

yes, you can split a bam file also chomosome wise if that is the question.

ADD REPLY
0
Entering edit mode

You don't need to split the bam file to do the variant calling per chromosome.

ADD REPLY
0
Entering edit mode

Because of the multiple large size samples, it will take a long time for variant calling. That's why I want to split the files chromosome wise.

ADD REPLY
0
Entering edit mode

You can do the variant calling separately per chromosome without splitting the bam. Bam allows random access. Which variant caller are you using?

ADD REPLY
0
Entering edit mode

I am using bcftools:

bcftools mpileup -Ov -f ref.fa file1.bam file2.bam file3.bam --- file100.bam | bcftools call -mv -o sample.vcf

Can it work for multiple bam files together for each chromosome? My whole point is to reduce computational time.

ADD REPLY
1
Entering edit mode

Then look at bcftools call -r/-R

ADD REPLY
0
Entering edit mode

Thank you! I tried using:

bcftools mpileup -Ov -f ref.fa file1.bam file2.bam file3.bam | bcftools call -r ch01:0-100,000 -mv -o example.vcf

But I got an error:

Failed to open -: not compressed with bgzip

Then I tried:

bcftools mpileup -Ov -f ref.fa file1.bam.gz file2.bam.gz file3.bam.gz | bcftools call -r ch01:0-100,000 -mv -o example.vcf

Again I got an error:

[E::hts_hopen] Failed to open file file1.bam.gz
[E::hts_open_format] Failed to open file file1.bam.gz
[mpileup] failed to open file1.bam.gz: Exec format error

I am not sure which file format to use now. Thank you for your help!

ADD REPLY
0
Entering edit mode

If you're worried about the variant calling taking a very long time, most variant callers (GATK/freebayes) can run on many threads, making the process faster. There are other ways to make variant calling faster, calling chromosome by chromosome is not standard.

ADD REPLY
0
Entering edit mode

I am not sure about the standard ways to make variant calling faster for multiple samples with bcftools. Can you please share if you are aware of any such way? Thank you so much!

ADD REPLY
2
Entering edit mode
4.5 years ago
inedraylig ▴ 60

You can use bamtools to split a bam file by chromosome, with

bamtools split -in file.bam -reference
ADD COMMENT
0
Entering edit mode

Thank you, I will try that!

ADD REPLY
0
Entering edit mode

you don't need to split per chromosome if you call with bcftools. See WouterDeCoster 's answer C: split sorted bam file chromosome wise

ADD REPLY

Login before adding your answer.

Traffic: 1799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6