Hi everyone,
I was confused about how can we plot/map many of our Insertions sequences (IS) or Transposon (Tn) of mobile genetic elements (MGE). I have already detected many ISs from around 2000 bacterial WGS of H. pylori using the tool named ISESCAN and I know each of their position inside each genome. After running I got many files (.fna .faa .gff .csv) containing information of ISs and Tn for each sample strain. I would like to make a graph (any graph is ok) that can give us information about:
- where is their position relatively in a reference or representative genome, so I can know what genes are mostly affected by these ISs or Tn.
- what genes are affected because they are near each IS or Tn in each genome
- how often (frequency) they are found in that position/gene (e.g., IS605 found near gene A in 1500 out of 2000 WGS, or if can not specify the IS type is also ok, so it will be just like: IS/Tn found near gene A in 1500 out of 2000 WGS, the other 500 WGS don't have any IS/Tn near of them)
In my plan, it should be like in the attached picture, however, what I want is with the x-axis being the position and the y-axis being the p or frequency (I got from this paper), but I don't understand very well how to make it.
For number 1, I already finished. What I have done was first, I tried to first blast all the IS and Tn sequences against a reference genome by using NCBI blast or identic tools, then I retrieved the position of every IS and Tn relative to the reference genome. Then I made the graph by using ggplot in R. In addition, to know what genes were affected by IS or Tn, I tried to list every neighbor gene with the range 1000bp before and after each of IS or Tn. However, I'm afraid my method is not 100% correct, so I need suggestions.
I still don't know how to do aims number 2 and 3. It will be difficult to do one by one since I have more than 2000 genomes.
Could anyone please give me suggestions?
What tool(s) or step(s) looks suitable for this purpose?
Thank you.
Sincerely,
Ricky