Extracting Genomic Coverage Information Across Different Samples
1
0
Entering edit mode
10.1 years ago
empyrean999 ▴ 180

Hello,

I have 3 bam files that I wanted to compare against each other. For example I have reference file with 10,000 sequences. I have paired end reads sequenced for 3 different samples.

1) Sample 1 is 100% same as reference so we expect all reads to map to it 2) Sample 2 is 80% similar to reference so 20% of reference sequences wont have any reads 3) Sample 3 is 60% similar to reference and 40% of reference wont have any reads.

Now my goal is to identify what reference sequences doesnot have any reads mapped in Sample 2 and 3.I need to identify the 20% reference sequences from Sample 2 and 40% from Sample 3.

Also in some cases in a reference which is approx 10kb long, sample 1 maps to entire 10kb, sample 2 maps to first 5kb and sample 3 maps to last 3kb. so I need to identify the partial regions for those reference sequences as well.

I have the mapped sorted bam files for all these three samples. I am looking in to using bedtools but not sure what in bedtools will give the answer I needed.

I have the following commands which might do similar but it outputs differences at every base.

genomeCoverageBed -bg -ibam sample1.bam > sample1.bedgraph
genomeCoverageBed -bg -ibam sample2.bam > sample2.bedgraph
unionBedGraphs -header -i sample1.bedgraph sample2.bedgraph -names sample1 sample2 -g reference.fai -empty > samples1and2.txt
perl samtools sequencing bedtools • 2.2k views
ADD COMMENT
0
Entering edit mode

The title of your question is incorrect, you are not comparing two bam files. You are trying to find which genomic regions do not have read coverages in various samples. (btw having an incorrect title makes it less likely to get a good answer)

ADD REPLY
0
Entering edit mode

Any suggestions on my question?

ADD REPLY
0
Entering edit mode
10.1 years ago

I think the way to solve this is to create sets of covered regions for each sample, then perform set operations as needed.

You can use bedtools intersect -c to produce the number of counts a gene is covered by reads.

Then use sort and use the uniq utility to count how many times does each gene appear.

ADD COMMENT

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6