Question: How To Identify Copy Number From 454 Targeted Sequencing Data?
We have received 454 targeted sequencing data (low depth) and would like to identify copy number segments from this. The rep at 454 said that read depth in this case might not be the most indicative of deletions or amplifications.

I was wondering what the best way to go about this would be? Are there programs that will identify segments from such data? Or, is there a graphical way to look at certain regions? We have CGH data that identified some segments and we would like to validate them using our 454 data.

Any help would be greatly appreciated!

We also sequenced quite a few bacteria where we had CGH array data. Thus far in our hands it worked perfectly to have coverage number variation as pointers. Doing this for single bases might be not that accurate (and of course it is troublesome to detect deletions). For larger sections we simply used a sliding window approach to identify higher than normal coverage regions. Our window size was defined such as the feature-size we were about to look for at the minimum.

In our case they were usually repetitve sequences and transposons which were abbarantly assembled by the Nwebler assembler.

If your genome is small < few Mbs and coverage still above 10-15x on average you might want to assemble the genome and use it (or most likely its segments) for comparison to your reference genome using tools like MUMmer which are ideally suited for these kind of jobs.

PS: some additional info on the project might help to answer your question more specifically.

PPS: Why do you think coverage numbers are not ok? I have seen many people using it for that and there ae quite some new algortithms developed recently to allow accurate detection of i.e. CNVs (usually using Illumina data though).

