sample contaminations and variant calling
0
0
Entering edit mode
6.6 years ago
User 4014 ▴ 40

Hi folks,

Sorry for a silly question. I am a newbie and I would like to do variant calling of 8 fungal genomes, but the problem is I am not sure that the samples were pure before sending to sequencing!? Do you know if there is any method to check if the dataset were from pure culture?

I tried using Geneious to call SNPs and it always shows two bases/ location as in the figure. Does this mean the samples were not pure (containing more than one strain) and I should not use the dataset?

enter image description here https://ibb.co/m2rQ1v

Thanks in advance and looking forward to great suggestions!

snp genome next-gen • 1.5k views
ADD COMMENT
0
Entering edit mode

How did you align the reads? - is there a reference genome available for each fungal species that you're studying? My logic would be that you could set the thresholds very high during the alignment step in order to ensure that only reads relating to a particular species of interest are mapped.

For example (prior to alignment):

  • eliminate short reads (<70bp)
  • trim bases at read ends that fall below base quality Phred-score 30

Alignment:

  • MAPQ>=60
  • Enable unique-mapping reads only
ADD REPLY
0
Entering edit mode

Thanks for your reply and sorry for being unclear. I meant contamination from another strain of the same species.

I understand that variant calling should show only 'one' type of nucleotides (either A or G, not a mixture of A and G as in the picture) if there is only one strain (genotype). That's why I am a little concerned if there is a contamination from another strain of the same species since I did not isolate the fungi from a single spore or hyphal tip. Please help to correct me if I am wrong.

ADD REPLY
0
Entering edit mode

I understand that variant calling should show only 'one' type of nucleotides (either A or G, not a mixture of A and G as in the picture) if there is only one strain (genotype).

You mean the species you are analysing is haploid? I don't see nothing wrong, if your organism is diploid.

I tried using Geneious to call SNPs and it always shows two bases/ location as in the figure.

With the current description of your methods we have no idea of what you made and how you called variants. Please read How To Ask Good Questions On Technical And Scientific Forums.

ADD REPLY
0
Entering edit mode

Thanks for your reply and sorry for having you puzzled. Yes, the fungi I am working on they are haploid. For this reason I doubt if the dataset I have in hands were generated from two isolates of the same species instead of one. I am sorry for asking this basic question, but what do you think of the result if it is generated from a haploid organism?

I trimmed the rawdata to Q30 using BBduck and aligned to the reference genome using Bowtie2. Variant calling was made using Freebayes with minimum alternate count = 3, minimum alternate fraction = 0.3, minimum probability = 0 and combine nearby variants = 3.

ADD REPLY
0
Entering edit mode

Variant callers should report multi-alleles at each position and separate them by a comma, and also give the read-depth for each. You can typically set the maximum number of allow multi-alleles from the command-line when running this process.

ADD REPLY
0
Entering edit mode

Thanks Kevin. As I mentioned before that I didn't do single spore or hyphal tip isolation, I relied simply on genotyping with SSR, which suggested that they are pure so I sent them to sequence. I will play around with parameters as you suggested.

Anyway, I could see from the alignment that there are multiple polymorphisms (for example, both A and G as in the figure) at a SNP position. Shouldn't it be only one allele (either A or G) for a haploid organism?

ADD REPLY
0
Entering edit mode

If the sequencer has sequenced correctly and not introduced any base errors AND the aligner has aligned properly AND the sample is from just one strain AND the type of DNA is not from meiotic cells AND it's haploid, then, yes, there should be just one call at each position.

The problem you have is looking like an issue with sample preparation. The bioinformatics part can only do much with the given data. I would still re-run and set the QC thresholds as high as possible (mainly during alignment) to see if anything changes. The fact that you already have minimum alternate fraction = 0.3 indicates to me, however, that your multi-allelic calls are most likely genuine.

Sorry to not help any further!

ADD REPLY

Login before adding your answer.

Traffic: 3846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6