How to detect and remove bacterial sequences in assembled eukaryote genome
1
0
Entering edit mode
4.0 years ago
Seq225 ▴ 110

I have sequenced and assembled a eukaryotic genome using 10X genomics technology. I used their supernova software package. Now, I want to inspect if there is any bacterial contig among my assembled sequences.

I know that one way is to blast the contigs against NCBI bacterial refseq (ftp://ftp.ncbi.nih.gov/refseq/release/bacteria) and remove the contigs that have a certain percentage matches. I am wondering if there is any other way? Any software package/pipeline?

Thanks!

Assembly sequencing SNP genome gene • 1.9k views
ADD COMMENT
0
Entering edit mode

Can you comment on the size of the assembly and the length and number of sequences? You could do a naive blast search as stated.

You could try sketch from BBMap suite.

If you suspected that your data had contamination, it would have been much better to have identified that before assembly using kraken2/centrifuge.

ADD REPLY
0
Entering edit mode

Thanks. My sample is not contaminated. I just want to get the endosymbiont sequences. The genome is 600Mb, 260million 150bp reads. Thanks

ADD REPLY
1
Entering edit mode
4.0 years ago
N15 ▴ 160

sourmash is perfect for detecting contamination in an assembly:

https://angus.readthedocs.io/en/2019/sourmash.html#

If you do have contamination, I would use bbmap to remove those reads and reassemble without them.

bbmap.sh in1=R1.fq.gz in2=R2.fq.gz ref=contam.fa outu1=R1.clean.fq.gz outu2=R2.clean.fq.gz

Where R1 is your forward read fastq file, R2 is the reverse, contam.fa is the contaminate's genome, outu1 is the now "cleaned" forward fastq file without the contaminate reads, and R2 is the reverse.

https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/

ADD COMMENT
0
Entering edit mode

Thanks. I am not thinking about contamination. I want to get the endosymbiont genome/contigs from my assembly. Any suggestions? Thanks!

ADD REPLY
0
Entering edit mode

Maybe look at GC content in the genome bins and separate by that? Check out the tool anvio, there could be some options:

http://merenlab.org/2019/10/17/export-locus/

ADD REPLY

Login before adding your answer.

Traffic: 2414 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6