Question

Determine relative copy numbers of plasmids

1

Entering edit mode

2.8 years ago

rororo ▴ 10

Hi!

I am working with an organism that has two plasmids and one chromosome. I would like to determine the relative copy number of my plasmids, respectively to my chromosome, which is in one copy. I have whole-genome sequencing data for this organism, so I thought of mapping my reads to the reference (3 sequences), determine the coverage of each sequence, and therefore get the values I am seeking. My code is:

bowtie2-build three-seqs.fasta three-seqs.fasta
bowtie2 --threads 10 -x three-seqs.fasta -U reads.fq.gz | samtools sort --threads 10 - -o mapped.bam
samtools depth mapped.bam >mapped.depth
perl calc.coverage.in.bam.depth.mads-albertsen.pl -i mapped.depth
# the last script is from: https://github.com/MadsAlbertsen/multi-metagenome/blob/master/misc.scripts/calc.coverage.in.bam.depth.pl

I get the following output from the perl script:

Name    Average.coverage
first   1198.907
second  286.216
chromosome  508.999

By dividing the coverage of each reference by the total coverage would give me the relative coverage, hence relative copy number:

Name    Average.coverage    % coverage
first   1198.907    60.1
second  286.216 14.4
chromosome  508.999 25.5

Therefore, my first plasmid is roughly twice as abundant as my chromosome, and my second plasmid roughly 0.5x, which would mean it's only in every second cell.

Is this workflow correct? Or am I missing something here, e.g. do I need to take into account the sequence length?

mapping bowtie2 samtools coverage • 671 views

ADD COMMENT • link 2.7 years ago by rororo ▴ 10

0

Entering edit mode

If there were no biases in sampling/lib prep/sequencing then that sounds about right. Sounds like your plasmids don't share much sequence with your chromosome.

ADD REPLY • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

Thanks for the hint, I should have checked that! I ran a blast and calculated ANI, and there's only 3 or 4 chunks that are similar due to conserved genes, but they are just 2 kb in total.

However: I now filtered my mapped.bam by Q30, which removes 6 % of my reads. When I calculate coverages, I get roughly equal coverage for my chromosome and my plasmid first, the second plasmid however stays at a relative abundance of 0.5:

first   500.441 42
second  208.254 17
chromosome  493.706 41

Why could that be? Is it multimapping reads that are causing this shift?

ADD REPLY • link 2.7 years ago by rororo ▴ 10