Duplicate of: https://github.com/lindenb/jvarkit/issues/190
Hi can anyone help me resolve the issue I'm having with sam2tsv. It is a nifty piece of software but I have been encountering issues with it regarding the numbering of nucleotides it shows for the reference sequence.
Here's what sam2tsv tells me:
The nucleotide string marked in red
CTGGCCGAGCTAG is the read reporting the mutation T>A (line #469).
But the reference sequence listed by sam2tsv (green box) doesn't match the read sequence at all. In fact it is correct sequence from the reference fasta, but it is right-shifted by
16 bases. With other reference files, this number is different, e.g. 20.
In fact, if I search the sequence
ATGGAGACCCGCT in my reference sequence it spans
356-368. In contrast sam2tsv lists these residues in the range
339-352 (as seen in the green box). Off by exactly
To summarize: 1) The position listed in the green box (reference), corresponds to the sequence in the red box (read). 2) The reference sequence listed in the green box, is off by 16 bases.
PacBio HiFi CCS reads aligned using
pbmm2 to custom reference (cDNA).
java -jar '~/Git/Others_Cloned/jvarkit/dist/sam2tsv.jar' -R ../../../reference/pBG-ERBB2/ERBB2.fa 1.ERBB2_Library.bam