Question

How to introduce artificial mutation in bam

1

Entering edit mode

5.7 years ago

bassanio ▴ 60

Hi,

Is there a way I can introduce mutation in bam file at specific position of a chromosome without running the alignment. For example I would like to change base from A>C at Chr1:12345?

bam Mismatch • 4.8k views

ADD COMMENT • link updated 5.7 years ago by h.mon 35k • written 5.7 years ago by bassanio ▴ 60

0

Entering edit mode

Introducing a mutation in a bam, what does that mean? Change all the sequenced nucleotides at that position? Change the reference genome at that position?

ADD REPLY • link 5.7 years ago by WouterDeCoster 48k

0

Entering edit mode

yes change the sequenced nucleotide not the reference

ADD REPLY • link 5.7 years ago by bassanio ▴ 60

1

Entering edit mode

5.7 years ago

JC 13k

I am not aware of any tool that can do this, but in general terms, you can:

Sort your BAM file by positions
Decompress the file to SAM
Use a Python/Perl script to look for the coordinate, you need to consider that the reference position is marked for the first base of the alignment, be sure to adjust position based on the CIGAR.
Convert back to BAM

ADD COMMENT • link 5.7 years ago by JC 13k

1

Entering edit mode

5.7 years ago

h.mon 35k

Maybe BAMSurgeon can do what you want. It has a -v flag to add SNVs to specific regions.

-v  VARFILENAME, --varfile VARFILENAME
    Target regions to try and add a SNV, as BED

Another option would be to simulate the reads with NEAT-GenReads, it also has a -v parameter to add variants to specific positions:

-v  Input VCF file. Variants from this VCF will be inserted into the simulated sequence with 100% certainty.

But in this case you would have to map the reads again.

edit: the thread NGS data simulation: VarSim or BAMSurgeon? may be of interest.

ADD COMMENT • link 5.7 years ago by h.mon 35k

score 6 · Accepted Answer · 2019-10-23

6

Entering edit mode

5.7 years ago

Pierre Lindenbaum 166k

I quickly wrote: http://lindenb.github.io/jvarkit/Biostar404363.html

$ samtools view src/test/resources/toy.bam | tail -1
x6  0   ref2    14  30  23M *   0   0   TAATTAAGTCTACAGAGCAACTA ??????????????????????? RG:Z:gid1

$ cat jeter.vcf
ref2    14  A   C

$ java -jar dist/biostar404363.jar -p jeter.vcf src/test/resources/toy.bam | tail -1
x6  0   ref2    14  30  23M *   0   0   CAATTAAGTCTACAGAGCAACTA ??????????????????????? PG:Z:0  RG:Z:gid1   NM:i:1

ADD COMMENT • link 5.7 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Hi Thanks, It worked for me. Just to add Is there a way we can make them heterozygote too?

ADD REPLY • link 5.7 years ago by bassanio ▴ 60

1

Entering edit mode

sure, I changed the input: it now takes a VCF file, use the field INFO/AF=0.5 to get heterozygote changes (see the example below). http://lindenb.github.io/jvarkit/Biostar404363.html

# look at the original bam at position 14:79838836
$ samtools tview -d T -p 14:79838836 src/test/resources/HG02260.transloc.chr9.14.bam | cut -c 1-20
     79838841  79838
NNNNNNNNNNNNNNNNNNNN
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
cataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
           taaaggcaa

# look at the VCF , we want to change 14:79838836 with 'A' with probability = 0.5
$ cat jeter.vcf
##fileformat=VCFv4.2
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency among genotypes, for each ALT allele, in the same order as listed">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
14  79838836    .   N   A   .   .   AF=0.5

# apply biostar404363
$  java -jar dist/biostar404363.jar -o jeter.bam -p jeter.vcf src/test/resources/HG02260.transloc.chr9.14.bam

# index the sequence
$ samtools index jeter.bam

# view the result (half of the bases were replaced because AF=0.5 in the VCF)
$ samtools tview -d T -p 14:79838836 jeter.bam | cut -c 1-20
     79838841  79838
NNNNNNNNNNNNNNNNNNNN
MATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
aataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
aataggaaaactaaaggcaa
AATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
           taaaggcaa

ADD REPLY • link 5.7 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

Thank you. It works too well.

ADD REPLY • link 5.7 years ago by bassanio ▴ 60

1

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY • link 5.7 years ago by WouterDeCoster 48k

0

Entering edit mode

will this work with forward and reverse reads in an aligned bam file?

ADD REPLY • link 5.5 years ago by zinderelly • 0

0

Entering edit mode

In a BAM, the sequence is always genomic 5'->3' whatever is the strand.

ADD REPLY • link 5.5 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thank you ! It seems to work very well for SNP, I was wondering if it also works for deletions and insertions?

ADD REPLY • link 4.4 years ago by laura.vp1994 • 0

0

Entering edit mode

I was wondering if it also works for deletions and insertions?

no, it doesn't

ADD REPLY • link 4.4 years ago by Pierre Lindenbaum 166k