pysam condition writing bam
1
1
Entering edit mode
2.1 years ago
Anst ▴ 40

Hello everyone!

I am trying to rewrite an existing bam file via pysam. I have reference.fasta file and file_to_be_changed.bam. I would like to change the nucleotide on a read given the position of the nucleotide and the reference position.

For example:

pos    0123456
ref    ATCGGTC
bam    ATCG
         CGT
          GGTC

I think that the problem could be solved with:

if pos_on_read == 0:
   if nucl_on_reference == 'A':
       nucl_on_read = 'C'

However, I have some problems with realising the algorithm via pysam.. Give me a hand please!

pysam rewrite bam • 838 views
ADD COMMENT
0
Entering edit mode
2.1 years ago

I would say that the problem you pose is be a lot more difficult to do than you think.

You could iterate for each alignment and you will get the POS field of each, then as you iterate on the read (on the aligned region) you could modify the stored sequence.

Then you could write out the alignment into a new BAM file.

But frankly, that alignment would be incorrect, the score would be wrong, the mapping quality would be incorrect, the CIGAR string would be incorrect, some of the TAGS would be incorrect, and even the alignment itself may be invalid, the modified read may then align somewhere else better.

I can't help by wonder what is the purpose of this fix? Your goal sounds a bit like going after a problem with a premature fix. Perhaps ask a different question on the problem you are after then trying to modify a BAM file.

ADD COMMENT
0
Entering edit mode

I have an ancient genome matrix which consists of the frequences of a nucleuote occurring given the reference base and the position on the read. I would like to apply this operator to a bam file to make it pseudo-ancient.

ADD REPLY
0
Entering edit mode

But then some of the alignments would not be correct. How come that is not a concern? To have incorrect alignments?

An alignment operates in the context of the entire information in the genome, you can't expect that locally changing both the reference and read would still produce valid alignments across the genome. Sure, some, or even the majority might stay correct, but that ratio depends on many factors and the error rate would be highly uneven, might be the most error-prone in the locations where the information between ancient and non-ancient genomes is more meaningful.

ADD REPLY
0
Entering edit mode

Won't realigment be helpful?

ADD REPLY
0
Entering edit mode

that would work

you could extract your reads from the BAM file then realign them to the new genome

ADD REPLY

Login before adding your answer.

Traffic: 1306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6