Question: genomeCoverageBed and supplementary alignments
0
gravatar for giovanni.bacci
2.0 years ago by
giovanni.bacci0 wrote:

Hi all, I would like to know how genomeCoverageBed deals with secondary alignments in a sam/bam file. I have a set of sam files generated with bowtie2 using the -a flag and I would like to generate a coverage map with genomeCoverageBed. These are the steps that I normally do to obtain a bam file to use for coverage estimation:

1 - Mapping reads with bowtie2
2 - Converting sam to bam and sorting output with samtools
3 - Removing duplicated reads with Picard (MarkDuplicates)
4 - Sorting output again with samtools
5 - Generating a coverage map with genomeCoverageBed

Actually, I tried to map my reads both with and without the -a flag and I got slightly different results. Is it possible that bedtools included secondary alignments in the coverage map? If this is not the case, how does genomeCoverageBed deal with secondary alignments?

Thanks in advance,

Giovanni

ADD COMMENTlink modified 2.0 years ago by Devon Ryan86k • written 2.0 years ago by giovanni.bacci0

What's the point of the step 3 ? removing duplicates ?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Picasa390

Removal of PCR duplicates, presumably.

ADD REPLYlink written 2.0 years ago by Devon Ryan86k

Sorry my question wasn't clear.

Why it's useful to remove duplicates for estimating coverage ?

ADD REPLYlink written 2.0 years ago by Picasa390

There's no estimation here, it's empirical observation. PCR duplicates are a nuisance that can often be ignored. This isn't always the case (e.g., if your signal should actually be highly focal, in which case duplicates can't be meaninfully marked), but often is.

ADD REPLYlink written 2.0 years ago by Devon Ryan86k

Ok I have a doubt now.

For contig coverage estimation (after assembly), should I also remove duplciates ?

ADD REPLYlink written 2.0 years ago by Picasa390

In my opinion they should be removed before coverage estimation otherwise your results would be overestimated.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by giovanni.bacci0

@Picasa: What fraction of reads have been marked as duplicates (in your case)? If the fraction is small then they could be a minor nuisance but if you have a much larger fraction marked as duplicates then you may need to trace if there is a reason (e.g. ultra low input for library prep etc) or it is a bad library prep.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax58k

Usually, I consider duplication levels lower than 3-5% to be normal artifacts due to PCR amplification or optical duplicates. In this case, I have a duplication level lower than 1% so I'm quite confident with that.

ADD REPLYlink written 2.0 years ago by giovanni.bacci0

@giovanni: My comment was directed at @Picasa (clarified now). Your case sounds "a ok".

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax58k

Removal of PCR and optical duplicates

ADD REPLYlink written 2.0 years ago by giovanni.bacci0
0
gravatar for Devon Ryan
2.0 years ago by
Devon Ryan86k
Freiburg, Germany
Devon Ryan86k wrote:

You can skip step 4, picard's output is already sorted.

bedtools will include secondary and supplemental alignments in its coverage. If you want to exclude them then you'll either need to prefilter them or use a different tool. Personally, I'd just use bamCoverage from deepTools, but I'm a bit biased there.

ADD COMMENTlink written 2.0 years ago by Devon Ryan86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 770 users visited in the last hour