Entering edit mode
10 weeks ago
zhangfish ▴ 40
I want to get following fields from a bam file:
I known I can get this fields from
pysam, But I want to do this step with shell, I have the following scheme
- Read_length use
awkcalculate length of reads
- Alignment_length use awk calculate MD fields?
- Edit_distance get from NM fields
I'm not sure if my understanding of Alignment length is correct, how should I get this information correctly?
for example, if I have a MD field:
So is the alignment length is
but if we have
here is also alignment length is
pysam output as same as previous
Can I use
read.query_alignment_length in pysam to get
I don't think you can get these fields from a bam with the shell. Bam files are in binary format and therefore not readable for humans :D
I think may
samtools viewcan easily convert
samformat to stdout and use pipeline to awk or samtools is ok
True you can just use the
--header-onlyflag and then you have all the info you need in a human readable format.
why reinventing the wheel with bash when you have numerous API (pysam, htsjdk, htslib... ) available.
Because I find a little difference between pysam
read.query_alignment_lengthand I calculate from MD filed, see my updated case for
MD is not needed to calculate length of alignment, only CIGAR (which tells you length of the aligned segment for example)