So, I have some contigs constructed from illumina paired-reads (with ABySS) that did not map to our reference genomic sequence, which was supposed to be the only thing in our sample. About half the reads did not map and we sequenced to a high depth. I want to find out which of these contigs are actually real.
My thought is to map the reads back to the contigs with bowtie2 and determine from the mapping data which are the most supported contigs. I already looked at how many reads mapped to each contig but I realized that didn't tell me enough information. I would like to determine support for a contig based on how many read pairs mapped concordantly and with the correct insert size. How can I do this procedurally? What should the formula look like for generating a quantitative measure of support?
Open to ideas other ideas, too.