Question: How to introduce artificial mutation in bam
0
gravatar for bassanio
15 months ago by
bassanio20
bassanio20 wrote:

Hi,

Is there a way I can introduce mutation in bam file at specific position of a chromosome without running the alignment. For example I would like to change base from A>C at Chr1:12345?

mismatch bam • 918 views
ADD COMMENTlink modified 15 months ago by h.mon32k • written 15 months ago by bassanio20

Introducing a mutation in a bam, what does that mean? Change all the sequenced nucleotides at that position? Change the reference genome at that position?

ADD REPLYlink written 15 months ago by WouterDeCoster45k

yes change the sequenced nucleotide not the reference

ADD REPLYlink written 15 months ago by bassanio20
6
gravatar for Pierre Lindenbaum
15 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

I quickly wrote: http://lindenb.github.io/jvarkit/Biostar404363.html

$ samtools view src/test/resources/toy.bam | tail -1
x6  0   ref2    14  30  23M *   0   0   TAATTAAGTCTACAGAGCAACTA ??????????????????????? RG:Z:gid1

$ cat jeter.vcf
ref2    14  A   C

$ java -jar dist/biostar404363.jar -p jeter.vcf src/test/resources/toy.bam | tail -1
x6  0   ref2    14  30  23M *   0   0   CAATTAAGTCTACAGAGCAACTA ??????????????????????? PG:Z:0  RG:Z:gid1   NM:i:1
ADD COMMENTlink written 15 months ago by Pierre Lindenbaum133k

Hi Thanks, It worked for me. Just to add Is there a way we can make them heterozygote too?

ADD REPLYlink written 15 months ago by bassanio20
1

sure, I changed the input: it now takes a VCF file, use the field INFO/AF=0.5 to get heterozygote changes (see the example below). http://lindenb.github.io/jvarkit/Biostar404363.html

# look at the original bam at position 14:79838836
$ samtools tview -d T -p 14:79838836 src/test/resources/HG02260.transloc.chr9.14.bam | cut -c 1-20
     79838841  79838
NNNNNNNNNNNNNNNNNNNN
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
cataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
           taaaggcaa

# look at the VCF , we want to change 14:79838836 with 'A' with probability = 0.5
$ cat jeter.vcf
##fileformat=VCFv4.2
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency among genotypes, for each ALT allele, in the same order as listed">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
14  79838836    .   N   A   .   .   AF=0.5

# apply biostar404363
$  java -jar dist/biostar404363.jar -o jeter.bam -p jeter.vcf src/test/resources/HG02260.transloc.chr9.14.bam

# index the sequence
$ samtools index jeter.bam

# view the result (half of the bases were replaced because AF=0.5 in the VCF)
$ samtools tview -d T -p 14:79838836 jeter.bam | cut -c 1-20
     79838841  79838
NNNNNNNNNNNNNNNNNNNN
MATAGGAAAACTAAAGGCAA
cataggaaaactaaaggcaa
aataggaaaaCTAAAGGCAA
cataggaaaactaaaggcaa
CATAGGAAAACTAAAGGCAA
aataggaaaactaaaggcaa
AATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
CATAGGAAAACTAAAGGCAA
AATAGGAAAACTAAAGGCAA
           taaaggcaa
ADD REPLYlink written 15 months ago by Pierre Lindenbaum133k
1

Thank you. It works too well.

ADD REPLYlink written 14 months ago by bassanio20
1

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 15 months ago by WouterDeCoster45k

will this work with forward and reverse reads in an aligned bam file?

ADD REPLYlink written 12 months ago by zinderelly0

In a BAM, the sequence is always genomic 5'->3' whatever is the strand.

ADD REPLYlink written 12 months ago by Pierre Lindenbaum133k
1
gravatar for JC
15 months ago by
JC12k
Mexico
JC12k wrote:

I am not aware of any tool that can do this, but in general terms, you can:

  1. Sort your BAM file by positions
  2. Decompress the file to SAM
  3. Use a Python/Perl script to look for the coordinate, you need to consider that the reference position is marked for the first base of the alignment, be sure to adjust position based on the CIGAR.
  4. Convert back to BAM
ADD COMMENTlink written 15 months ago by JC12k
1
gravatar for h.mon
15 months ago by
h.mon32k
Brazil
h.mon32k wrote:

Maybe BAMSurgeon can do what you want. It has a -v flag to add SNVs to specific regions.

-v  VARFILENAME, --varfile VARFILENAME
    Target regions to try and add a SNV, as BED

Another option would be to simulate the reads with NEAT-GenReads, it also has a -v parameter to add variants to specific positions:

-v  Input VCF file. Variants from this VCF will be inserted into the simulated sequence with 100% certainty.

But in this case you would have to map the reads again.

edit: the thread NGS data simulation: VarSim or BAMSurgeon? may be of interest.

ADD COMMENTlink modified 15 months ago • written 15 months ago by h.mon32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1661 users visited in the last hour
_