Question: Detect Transgene Insertion Site using SAM format
0
gravatar for kspata
4 months ago by
kspata70
Chicago
kspata70 wrote:

Hi All,

I am working on identifying insertion site and copy number of a retroviral vector integration in human cell line. The sample was sequenced by WGS on NextSeq with around 900 million PE150 reads.

I concatenated hg38 and the vector genome and mapped reads to this combined reference using BWA. I used SAM format to identify reads containing transgene using following command from the SAM file.

cat sample.sam | grep "vector" | awk '$7~/chr*/' > transgene_reads.sam (Which will extract reads which mapped to vector and their mates which mapped to human chromosomes.

This command generated an output sam file which I believe have the reads where the first of the pair is mapped to the vector and mate is mapped to the human chromosome.

The problem is that the trangene_reads.sam is showing me that the vector has mate pairs mapping in all of the chromosomes e.g chr1, chr2, chr3, and so on.

I visualized bam file for chr7 using IGV and setting the visualization to show the chromosome by mate. The coverage for the target at the region is 2X i.e 2 reads map to the target in the region chr7:50343667

Here is the IGV screenshot chr7-target-IGV-screenshot

The highlighted yellow region is the vector region.

  1. Is it common for a vector to integrate itself into multiple chromosomes at low frequency?
  2. How can I validate and identify true insertion site using IGV and SAM file?
  3. Should i remove supplementary and secondary alignment from the SAM file before proceeding to insertion site analysis?

Thanks in advance !!

ADD COMMENTlink written 4 months ago by kspata70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1338 users visited in the last hour