Question: Tool for replacing variant-bases with reference in aligned reads, i.e anonymize bam files?
gravatar for vegard nygaard
3.8 years ago by
Oslo University Hospital, Norway
vegard nygaard170 wrote:

Hi, I am looking for a tool that takes as input a bamfile with aligned reads and the reference genome and outputs a bamfile where every variant (non-reference basecall) is replaced with the reference base call, but the alignment is kept.

I need this in order to de-senitize bam files so I am allowed to distribute them more freely, typically in troubleshooting situations where alignment is more important than variants.

I was not able to find such a tool or option in familiar tools and while writing this I realize it might be a bit more tricky than I thought; what to do with indels?

Feedback appreciated.

rna-seq • 801 views
ADD COMMENTlink written 3.8 years ago by vegard nygaard170

It sounds similar to "Create a dummy bam file from a bed coordinates and ref fasta." where the bed coordinates can be obtained from existing bam.

Not only about indels, what will you do with base quality score when you replace with reference ? Especially when you encounter a 'N' in your bam read.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by geek_y11k

After mapping the base quality doesn't really matter anymore, assuming you are only going to use this edited bam for differential expression analysis...

ADD REPLYlink written 3.8 years ago by WouterDeCoster43k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour