Entering edit mode
4.5 years ago
evelyn
▴
230
Hello,
I have downsampled sorted bam files using:
samtools view -h -s 0.5 E.sorted.bam > E.5x.sorted.sam
And again compressed the files:
samtools -bS E.5x.sorted.sam > E.5x.sorted.bam
samtools index E.5x.sorted.bam
Then I want to do variant calling using bcftools with multiple samples altogether in the list named bams.list with E.5x.sorted.bam
files:
bcftools mpileup -Ov -f ref.fa -b bams.list | bcftools call -mv -o example.vcf
But it stops abruptly without giving an error. But it runs using the same command when I use the E.5x.sorted.sam
files. I am not sure why it is not working with E.5x.sorted.bam
. Thank you for your help!
What is the terminal message that comes up when it stops? Have you check RAM consumption at that point?
There is no error showing. I just got this:
This is the same message that I get for successful runs.
I see, but is there no further message even from your operating system? When you say that it just 'stops', what actually happens? - it just returns to the command line? Does it produce the VCF?
What is the content of bams.list?
This sentence is somewhat contradictory, because you imply that it both does and does not work with
E.5x.sorted.bam
?When I submit the script, it stops running after almost 10 seconds and I can see this output message that I shared earlier and it also produces the only header of the vcf file.
bams.list
is a list of sorted bam files.Yes, that is what I am trying to understand. It works when I use the SAM files obtained after downsampling sorted bam files but does not work when I use the sorted bam files by compressing the SAM files again for downsampled files.
I see. From what you're telling me, the command is actually running successfully but just not outputting anything. Are you downsampling your data too much such that there are very low numbers of reads remaining? For example, what is the depth of coverage profile for the problematic BAM? - you can use
bedtools coverage
to quickly check, I thinkI am downsampling the files to an average of ~8x. How does it works for the SAM but not for the sorted Bam both for downsampled files.
Have you checked the contents of both?
I have checked the files that I compressed again using
samtools view -bS E.5x.sorted.sam > E.5x.sorted.bam
and have found that these output sorted bam files are all of size71
. So something is definitely wrong in compressing the files again.Oh, should that not be:
I have updated my previous comment as I used
I can try using what you have suggested and will let you know. Thank you!
Ah, I see... it should, therefore, be okay. What are the contents of these SAM files? - do they contain reads?
Yes, these
SAM
files are almost 10 GB in size and contain reads. So I am trying to call variants for all files together for one chromosome at a time by specifyingbcftools mpileup -r
In order to use-r
, index files of the respective input files are needed. When I try indexing theseSAM
files, it fails. So I wanted to convert theseSAM
files toSORTED BAM
and thenindex
these to use for variant calling for all files at the same time for one chromosome.I imagine, therefore, that the issue is that you do not have enough RAM to sort the SAM files. How much RAM is available on your current machine?
I have 8gb on mac that I am using. However, I use the cluster for such jobs.
It is difficult for me to understand what is happening from here (remotely). Just some final pointers: