I have a BAM file containing alignments of clones to a concatenated viral and host reference genome (combined.fa). My goal is to extract secondary alignments that are mapped to the host chromosome but have a secondary alignment with the viral genome. Specifically, I know the location of the insertion site and I want to identify these secondary alignments and determine the orientation of the viral insertion.
I attempted to use the samjdk tool as suggested in this post. However, the output only showed SQ tags without displaying the reads.
And I also thought of using samflags to extract the secondary alignment with this command line, but then I am lost, even I now have the integration site/ the 2nd reads to virus, I still do not know how to find the orientation of how the virus is inserted to the host genome.
samtools view -f 256 -h C1.sorted.marked.bam | grep -e "SA:Z:VIRUS" | samtools view -b - > C1_viral_secondary_alignments.bam
Appreciate your guidance and help on commenting the right way of how I should extract the secondary alignments and how to identify its orientation when it got inserted. Thanks a lot.
is it paired or single end ? how was the fastq mapped ?
Hi Pierre, they are paired-end, and they are mapped with bwa-mem2.