Question

Unique insertion sites Calculation from Himar1 C9 based TnSeq

0

Entering edit mode

17 hours ago

Kossivi • 0

Hello,

I am using bioinformatic tools for the analysis of my TnSeq data. I am using the pipe line : “Essential gene détection with transposon insertion sequencing” from Galaxy plateforme and I got the list of my essential genes. My TnSeq is Himar1 C9 based, thus insert in TA sites.

I wanted to calculate the unique insertion sites but don’t know how to do that.

I tried this : I converted the BAM file to BED file then extracted the 5’ends of the reads representing Transposons insertion sites cordinates awk function ( ‘BEGIN{OFS=”\t”}{if($6==”+”) print $1,$2,$2+1; else if ($6==”-”) print $1,$3-1,$3;}. Then the output was filtered to keep only unique occurrences. And this was considered my unique insertion sites (UIS). However I got around 200 000 UIS which is far more than the 78000 TA sites available in my genome. So I guess my strategy is not working .

Help please, don’t know how to do this.

Kossivi • 103 views

ADD COMMENT • link updated 13 hours ago by GenoMax 154k • written 17 hours ago by Kossivi • 0

score 1 · Answer 1 · 2025-10-16

While Himar1 inserts in TA motifs, the library is generated following restriction enzyme treatment (Mmrl) and ligation. The Mmrl sequence is likely present at other locations in your genome, and not only at transposon inserts; and any dna fragments may also be ligated with sequencing adapters. Thus, you should see a combination of both insert sites, cuts distal to existing mmr1 motifs, and random background; so seeing an excess of fragment locations is not surprising.

Also - the DNA fragment may be sequenced in any direction to generate the read, so the 5' end could be the TA-proximal side, or opposite.