Hello, I am struggling to processing and analyse bam files (from bwa alignment), to extracting the chimeric read alignment. I am aligning human cell line RNA-seq data (paired end) to virus, aimed to find the viral integration sites in the genome. For that, after reading a bit here from following links like: Confirming Viruses detected from NGS data
For find out the chimeric reads (partially mapping to one and partially to other genome),I have concatenated the two genomes (human + viral) and prepared the index using bwa index
bwa index -a bwtsw human_virus.fa
and then performed the mapping using bwa-mem and piping the output samtools of alignment to
bwa mem -R '@RG\tID:ID\tSM:SM\tPL:Illumina' index/human_virus.fa input_pair_1.fastq input_pair_2.fastq | samtools view -bS > input.bam
I have sorted and indexed the resulting input.bam (input_sorted.bam and input.bai)
Now for the post alignment processing, I am trying samtools view -f 0x04 (for unmapped reads) and flagstat to find other statistics.
My question question is how can I extract the numbers of reads that are partially mapped to either of the two genome, how many reads mapped to viral genome and which to human. Conceptually, I might not be understanding this problem, that is why I cab not implement proper samtool view options.
Plus I have approx 50 samples how could I evaluate how many reads have mapped to the viral genome in each sample.
I would really appreciate your inputs.