I have 3 bam files that i wanted to compare against each other. For example i have reference file with 10,000 sequences. I have paired end reads sequenced for 3 different samples.
1) Sample 1 is 100% same as reference so we expect all reads to map to it 2) Sample 2 is 80% similar to reference so 20% of reference sequences wont have any reads 3) Sample 3 is 60% similar to reference and 40% of reference wont have any reads.
Now my goal is to identify what reference sequences doesnot have any reads mapped in Sample 2 and 3.I need to identify the 20% reference sequences from Sample 2 and 40% from Sample 3.
Also in some cases in a reference which is approx 10kb long, sample 1 maps to entire 10kb, sample 2 maps to first 5kb and sample 3 maps to last 3kb. so i need to identify the partial regions for those reference sequences as well.
i have the following commands which might do similar but it ouputs differences at every base.
genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph unionBedGraphs -header -i sample1.bedgraph sample2.bedgraph -names sample1 sample2 -g reference.fai -empty > samples1and2.txt