Question: Determine mutational profile from a BAM file?
0
gravatar for mark.ziemann
3.8 years ago by
mark.ziemann1.1k
Australia/Mebourne/Geelong/Deakin
mark.ziemann1.1k wrote:

Is there any tool around that can generate a mutational profile from a BAM file? I know I could figure it out with the CIGAR string but I was hoping there was a tool already to do it. I need the following:

-Non-templated extension, these are mismatches that occur at the sequence termini. Numbe of reads with 1,2,...n non-templated bases

-Internal mismatch rate, and number of reads with 1,2,..n mismatches

-Mismatch profile (ie proportion of A->C, G->A, etc, events)

-Insertion and deletion rate (list of most common indels)

Thanks

 

rna-seq genome • 1.3k views
ADD COMMENTlink written 3.8 years ago by mark.ziemann1.1k
1
gravatar for Ashutosh Pandey
3.8 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

I don't think there is a single package available that will cater to all of your requirements. Infact right now I can only think of RSeQC (http://rseqc.sourceforge.net), a RNA-seq package (but it should work with genomic BAMs too) that generates a few of the above stats as output. Most of the people generally use VCF files to get information about insertion and deletion rates, and mismatch profiles. But VCF file gives you a global view rather than a read-centric information. VCF file will mostly include real variants and may not help you with profiling sequencing errors that you are mostly interested with. Most of the variant callers do take into account the position of variants wrt to reads and flag those with any positional bias but the positional information is not stored and all you see is a p-value in the vcf file. I think you have unintentionally raised an important issue here. We have plenty of tools that take fastq or bam file and analyse it for GC bias, read distribution, insert size, 3' end error profiles etc. but only a very few tool that can give a comprehensive idea about the possible sequencing errors w.r.t to read. That being said, I would advise you to use RSeQC. 

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Ashutosh Pandey11k

RSeQC looks useful, but yes only does a portion of what I need.

Thanks, Mark

ADD REPLYlink written 3.8 years ago by mark.ziemann1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1346 users visited in the last hour