Question: bcftools multiple samples
0
gravatar for evelyn
4 weeks ago by
evelyn70
evelyn70 wrote:

Hello,

I have downsampled sorted bam files using:

samtools view -h -s 0.5 E.sorted.bam > E.5x.sorted.sam

And again compressed the files:

samtools -bS E.5x.sorted.sam > E.5x.sorted.bam
samtools index E.5x.sorted.bam

Then I want to do variant calling using bcftools with multiple samples altogether in the list named bams.list with E.5x.sorted.bam files:

bcftools mpileup -Ov -f ref.fa -b bams.list | bcftools call -mv -o example.vcf

But it stops abruptly without giving an error. But it runs using the same command when I use the E.5x.sorted.sam files. I am not sure why it is not working with E.5x.sorted.bam. Thank you for your help!

snp • 121 views
ADD COMMENTlink written 4 weeks ago by evelyn70

What is the terminal message that comes up when it stops? Have you check RAM consumption at that point?

ADD REPLYlink written 4 weeks ago by Kevin Blighe52k

There is no error showing. I just got this:

Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 25 samples in 25 input files

This is the same message that I get for successful runs.

ADD REPLYlink written 4 weeks ago by evelyn70

I see, but is there no further message even from your operating system? When you say that it just 'stops', what actually happens? - it just returns to the command line? Does it produce the VCF?

What is the content of bams.list?

But it stops abruptly without giving an error. But it runs using the same command when I use the E.5x.sorted.sam files. I am not sure why it is not working with E.5x.sorted.bam.

This sentence is somewhat contradictory, because you imply that it both does and does not work with E.5x.sorted.bam?

ADD REPLYlink written 4 weeks ago by Kevin Blighe52k

When I submit the script, it stops running after almost 10 seconds and I can see this output message that I shared earlier and it also produces the only header of the vcf file.

bams.list is a list of sorted bam files.

Yes, that is what I am trying to understand. It works when I use the SAM files obtained after downsampling sorted bam files but does not work when I use the sorted bam files by compressing the SAM files again for downsampled files.

ADD REPLYlink written 4 weeks ago by evelyn70

I see. From what you're telling me, the command is actually running successfully but just not outputting anything. Are you downsampling your data too much such that there are very low numbers of reads remaining? For example, what is the depth of coverage profile for the problematic BAM? - you can use bedtools coverage to quickly check, I think

ADD REPLYlink written 4 weeks ago by Kevin Blighe52k

I am downsampling the files to an average of ~8x. How does it works for the SAM but not for the sorted Bam both for downsampled files.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by evelyn70

Have you checked the contents of both?

ADD REPLYlink written 4 weeks ago by Kevin Blighe52k

I have checked the files that I compressed again using samtools view -bS E.5x.sorted.sam > E.5x.sorted.bam and have found that these output sorted bam files are all of size 71. So something is definitely wrong in compressing the files again.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by evelyn70

Oh, should that not be:

samtools view -b E.5x.sorted.sam > E.5x.sorted.bam
ADD REPLYlink written 4 weeks ago by Kevin Blighe52k

I have updated my previous comment as I used

samtools view -bS E.5x.sorted.sam > E.5x.sorted.bam`

I can try using what you have suggested and will let you know. Thank you!

ADD REPLYlink written 4 weeks ago by evelyn70

Ah, I see... it should, therefore, be okay. What are the contents of these SAM files? - do they contain reads?

ADD REPLYlink written 4 weeks ago by Kevin Blighe52k

Yes, these SAM files are almost 10 GB in size and contain reads. So I am trying to call variants for all files together for one chromosome at a time by specifying bcftools mpileup -r In order to use -r, index files of the respective input files are needed. When I try indexing these SAM files, it fails. So I wanted to convert these SAM files to SORTED BAM and then index these to use for variant calling for all files at the same time for one chromosome.

ADD REPLYlink written 4 weeks ago by evelyn70

I imagine, therefore, that the issue is that you do not have enough RAM to sort the SAM files. How much RAM is available on your current machine?

ADD REPLYlink written 4 weeks ago by Kevin Blighe52k

I have 8gb on mac that I am using. However, I use the cluster for such jobs.

ADD REPLYlink written 4 weeks ago by evelyn70

It is difficult for me to understand what is happening from here (remotely). Just some final pointers:

  • check the contents of all of your files, and at every step (including input and output)
  • check the default parameters of all of your commands
  • ensure that 'downsampling' to 5x is doing what you think that it is doing
ADD REPLYlink written 4 weeks ago by Kevin Blighe52k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 728 users visited in the last hour