Question: GenomeScope analysis failed
0
gravatar for maxnest
10 months ago by
maxnest10
maxnest10 wrote:

Dear colleagues

I was provided with sequencing data of a single protist genome. I used Jellyfish (-k 21 -m 35M) on the Trimmomatic libraries to estimate the approximate genome size. However, the analysis with the help of GenomeScope failed, and I get these results: http://genomescope.org/analysis.php?code=u2FlyNR00NjbjMAkpSUG. I am interested in answers to three questions: 1) Does it make sense to continue working with such data? 2) what could cause such results? 3) Can you advise an effective way to get rid of bacterial contamination if specific sources of contamination are unknown? Unfortunately, direct comparison with the NCBI nr database is extremely time consuming, calculated in weeks. I will be grateful for any help or advice

ADD COMMENTlink modified 10 months ago • written 10 months ago by maxnest10
2

1) depending on the purposes, yes it makes sense to continue with it

2) you could give other kmer sizes a try, depending on the sequencing depth & genome size other kmer values might work better

3) You could try to use Kraken or soft like that for filtering the reads. Otherwise I would assemble it all and filter the assembled contigs afterwards for the contaminations

ADD REPLYlink modified 10 months ago • written 10 months ago by lieven.sterck5.8k
1

Dear colleagues, I am very grateful to you for the answers. Thank you so much for Kraken and BlobTools, this is exactly what I wanted to find.

ADD REPLYlink written 10 months ago by maxnest10
1
gravatar for h.mon
10 months ago by
h.mon27k
Brazil
h.mon27k wrote:

If you are estimating genome size, I assume you want to use this data to assemble a genome. With what you have shown alone it is not possible to answer for sure, but I tend to believe it is worth to use the data. Did you perform other quality checks on the data?

I think bad sequencing and insufficient coverage could cause this. Maybe a genome with lots of repeats with different levels of similarity could cause this, but this is just a wild guess.

I agree with lieven.sterck in both his suggestions: Kraken could be used to filter the reads prior to assembly, but I think filtering after assembly is better. I like BlobTools for post-assembly filtering.

ADD COMMENTlink written 10 months ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 985 users visited in the last hour