Question: How long will bam-readcount take to run human samples?
0
gravatar for Laven9
21 days ago by
Laven90
Laven90 wrote:

I am now trying to run bam-readcounts. The sample I use is from human. But it takes almost 2-3 days to run one bam file. Is it normal? The size of my file is 8,000,000 KB. I am a little worried the time it takes is too long.

bam-readcounts • 121 views
ADD COMMENTlink modified 20 days ago by ATpoint19k • written 21 days ago by Laven90

Consider using mosdepth (https://github.com/brentp/mosdepth ) instead. It is multi-threaded. 8G is not that large a bam file so 2-3 days sounds excessive. bam-readcounts is a single threaded application?

ADD REPLYlink modified 21 days ago • written 21 days ago by genomax69k

I am not sure whether bam-readcounts is a single threaded application, but the time it cost is much longer than 2-3 days. And I am going to try mosdepth now. Thanks for your advice!

ADD REPLYlink written 20 days ago by Laven90

mosdepth and varscan are not compatible, don't even try. plit the file into chromosomes and use GNU parallel to parallelize. see answer below

ADD REPLYlink modified 20 days ago • written 20 days ago by ATpoint19k

Oh! Is it that? Then I'll try GUN parallel. Thanks a lot !

ADD REPLYlink written 20 days ago by Laven90

Actually, I am a little worried. The size I got with bam-readcounts is as large as 200,000,000 KB with an input of 8,000,000. I am not sure if it is normal. The code I use is quite simple, as:

bam-readcount my.bam -f hg38.fa > my.metric.txt
ADD REPLYlink written 20 days ago by Laven90
1
gravatar for ATpoint
20 days ago by
ATpoint19k
Germany
ATpoint19k wrote:

Above you are running the tool on the entire genome which is not necessary and slow, you only need the positions where you have variants. Here is a suggestion, given you have a BAM file called your.bam which is indexed by samtools and a list of variants your.vcf:

## Use the variant file to create a BED file with the variants for which `bam-readcounts`extracts information:
grep -v '^#'  your.vcf \
  | awk 'OFS="\t" {print $1, $2-1, $2+1}' \
  | sort -k1,1 -k2,2n \
  | bedtools merge -i - > your.bed

## Use samtools to pass only genomic positions to bam-readcount where variants are located:
samtools view -b -L your.bed your.bam > your_subset.bam

bam-readcount (...command...) your_subset.bam
ADD COMMENTlink modified 20 days ago • written 20 days ago by ATpoint19k

Sorry for reply so late. Thanks for your help! Wow! I think I need to use those tools more flexibly. It seems much faster if I just run with bam-readcount the positions I need in the file. Thanks again for your help. It do help a lot!

ADD REPLYlink written 8 days ago by Laven90

Sorry for troubling again. But I indeed have met with an annoying problem. When I tried to run fpfilter by Varscan, I got results like:

171661 variants in input file
171466 had a bam-readcount result
47692 had reads1>=2
0 passed filters
171661 failed filters
195 failed because no readcounts were returned
 425 failed minimim variant count < 4
 34 failed minimum variant freq < 0.05
 13936 failed minimum strandedness < 0.01
 47716 failed minimum reference readpos < 0.1
 171466 failed minimum variant readpos < 0.1
 47716 failed minimum reference dist3 < 0.1
171466 failed minimum variant dist3 < 0.1
 0 failed maximum reference MMQS > 100
 0 failed maximum variant MMQS > 100
 0 failed maximum MMQS diff (var - ref) > 50
29 failed maximum mapqual diff (ref - var) > 50
2209 failed minimim ref mapqual < 15
9102 failed minimim var mapqual < 15
1051 failed minimim ref basequal < 15
134 failed minimim var basequal < 15
 0 failed maximum RL diff (ref - var) > 0.25

And when I run bam-readcount, there are all warnings like:

 Couldn't find the generated tag.
 Couldn't find single-end mapping quality. Check to see if the SM tag is in BAM.

I do have researched for the warnings, and it was saying that I could ignore it if I don't use the result of sing-end counts. But I am not quite sure if it is the warnings leading to the failing of my filter. And this is some of my output of readcount:

    chr1    14539   T       4       =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.0
0       C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  T:4:18.25:31.50:0.00:
4:0:0.00:0.02:0.00:0:0.00:0.00:0.00     N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

Could you please give me some advice on the failed filtering?

ADD REPLYlink modified 6 days ago • written 6 days ago by Laven90

I haven't used that tool in a while and actually gave up on it because I never found proper documentation on things like this fpfilter. The tool is not maintained anymore and pretty slow unless one invests time in custom parallelization. It served its purpose to the community but now I would really use a more recent and maintained tool.

ADD REPLYlink written 6 days ago by ATpoint19k

And may be I'd better try more recent tools too. Luckily, I have just found a review for this area and I think it will help a lot. I used to try Mutect2, but may be because I only offer two bam files ( a tumor bam and a normal bam), the result I got seems not exclude those SNPs. And I now guess it will do better if I could offer gnomad files as well. I am going to try this tool again with gnomad files ( it is hard to download really, area problem).

Anyway, thank you for your help! It do help a lot!

ADD REPLYlink modified 6 days ago • written 6 days ago by Laven90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 691 users visited in the last hour