Hi,
Is there a way I can introduce mutation in bam file at specific position of a chromosome without running the alignment. For example I would like to change base from A>C at Chr1:12345?
Hi,
Is there a way I can introduce mutation in bam file at specific position of a chromosome without running the alignment. For example I would like to change base from A>C at Chr1:12345?
I quickly wrote: http://lindenb.github.io/jvarkit/Biostar404363.html
$ samtools view src/test/resources/toy.bam | tail -1
x6  0   ref2    14  30  23M *   0   0   TAATTAAGTCTACAGAGCAACTA ??????????????????????? RG:Z:gid1
$ cat jeter.vcf
ref2    14  A   C
$ java -jar dist/biostar404363.jar -p jeter.vcf src/test/resources/toy.bam | tail -1
x6  0   ref2    14  30  23M *   0   0   CAATTAAGTCTACAGAGCAACTA ??????????????????????? PG:Z:0  RG:Z:gid1   NM:i:1
sure, I changed the input: it now takes a VCF file, use the field INFO/AF=0.5 to get heterozygote changes (see the example below). http://lindenb.github.io/jvarkit/Biostar404363.html
# look at the original bam at position 14:79838836
$ samtools tview -d T -p 14:79838836 src/test/resources/HG02260.transloc.chr9.14.bam | cut -c 1-20
     79838841  79838
NNNNNNNNNNNNNNNNNNNN
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
cataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
           taaaggcaa
# look at the VCF , we want to change 14:79838836 with 'A' with probability = 0.5
$ cat jeter.vcf
##fileformat=VCFv4.2
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency among genotypes, for each ALT allele, in the same order as listed">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
14  79838836    .   N   A   .   .   AF=0.5
# apply biostar404363
$  java -jar dist/biostar404363.jar -o jeter.bam -p jeter.vcf src/test/resources/HG02260.transloc.chr9.14.bam
# index the sequence
$ samtools index jeter.bam
# view the result (half of the bases were replaced because AF=0.5 in the VCF)
$ samtools tview -d T -p 14:79838836 jeter.bam | cut -c 1-20
     79838841  79838
NNNNNNNNNNNNNNNNNNNN
MATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
aataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
aataggaaaactaaaggcaa
AATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
           taaaggcaa
Maybe BAMSurgeon can do what you want. It has a -v flag to add SNVs to specific regions.
-v  VARFILENAME, --varfile VARFILENAME
    Target regions to try and add a SNV, as BED
Another option would be to simulate the reads with NEAT-GenReads, it also has a -v parameter to add variants to specific positions:
-v  Input VCF file. Variants from this VCF will be inserted into the simulated sequence with 100% certainty.
But in this case you would have to map the reads again.
edit: the thread NGS data simulation: VarSim or BAMSurgeon? may be of interest.
I am not aware of any tool that can do this, but in general terms, you can:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Introducing a mutation in a bam, what does that mean? Change all the sequenced nucleotides at that position? Change the reference genome at that position?
yes change the sequenced nucleotide not the reference