I am working with an organism that has two plasmids and one chromosome. I would like to determine the relative copy number of my plasmids, respectively to my chromosome, which is in one copy. I have whole-genome sequencing data for this organism, so I thought of mapping my reads to the reference (3 sequences), determine the coverage of each sequence, and therefore get the values I am seeking. My code is:
bowtie2-build three-seqs.fasta three-seqs.fasta bowtie2 --threads 10 -x three-seqs.fasta -U reads.fq.gz | samtools sort --threads 10 - -o mapped.bam samtools depth mapped.bam >mapped.depth perl calc.coverage.in.bam.depth.mads-albertsen.pl -i mapped.depth # the last script is from: https://github.com/MadsAlbertsen/multi-metagenome/blob/master/misc.scripts/calc.coverage.in.bam.depth.pl
I get the following output from the perl script:
Name Average.coverage first 1198.907 second 286.216 chromosome 508.999
By dividing the coverage of each reference by the total coverage would give me the relative coverage, hence relative copy number:
Name Average.coverage % coverage first 1198.907 60.1 second 286.216 14.4 chromosome 508.999 25.5
first plasmid is roughly twice as abundant as my
chromosome, and my
second plasmid roughly 0.5x, which would mean it's only in every second cell.
Is this workflow correct? Or am I missing something here, e.g. do I need to take into account the sequence length?