Mapping illumina seq reads to bacterial reference genome.
3
0
Entering edit mode
3.4 years ago
satwa • 0

I'm new at bioinformatics and I have fastq files for whole genome sequencing of a bacterial genome. When I make de novo assembly I get hundreds of contigs. How can I get the whole genome assembled? If I want to map such contigs to a reference genome, how can I choose the closest genome? and which tools can I use for mapping. How can I identify plasmids sequences?

assembly alignment genome • 1.9k views
ADD COMMENT
0
Entering edit mode

You want to map the reads to the reference genome or generate a de novo assembly from the raw sequencing data?

ADD REPLY
0
Entering edit mode

I de novo assembled the reads using spades but I got thousands of contigs that is why I decided to map to a reference genome.

ADD REPLY
0
Entering edit mode

If you are not sure about the organism try classifying the reads with Kraken2.

ADD REPLY
2
Entering edit mode
3.3 years ago
appiahv ▴ 20

Hello, you can do a reference guided scaffolding using RagTag (https://github.com/malonge/RagTag ). RagTag uses minimap2 or Nucmer under the hood to map your reads to a reference genome and finally generates a consensus sequence which is your scaffold.

If you want to identify plasmid, then what you need to do is to obtain the plasmid sequence you want to use as your reference. Then when using RagTag you specify that as your reference sequence.

After you get your consensus sequence you can compare to other sequences using BRIG software. This will generate an image for the comparison result. I made a video of how to use BRIG here : https://youtu.be/pobQgE4z-5Q

ADD COMMENT
0
Entering edit mode
3.3 years ago
juanjo75es ▴ 130

If your contigs are large enough (larger than 3000 bp), get a subsequence of a contig (maybe 10000 bp) and make a BLAST search on the NCBI portal. You will get there a list of results with links to the complete sequences.

If your contigs aren't large enough, there are three options:

  • Your reads are messed up. Sometimes whoever programs the sequencing selects reads that do not overlap for whatever reason (maybe he just wants to map the reads, not assemble them). In that case, you are mostly helpless but you can try software like kraken2 or just blast some reads to find a reference.

  • Your reads contain adaptor sequences. In that case, you should try tools like Trimmomatic or TrimGalore. Then try again assembling.

  • Your assembler just sucks for that data. Sometimes there is nothing especially wrong with your data but an assembler just doesn't like that data and fails miserably. It happens more often than it seems. In these cases, you should find an assembler that does not have that weakness with your data. (cough... I don't want to make too obvious advertising but look at my profile to find a software that will never leave you helpless, cough...)

ADD COMMENT
0
Entering edit mode
3.3 years ago
MSRS ▴ 580

Hi, you can find some answer from here . Thank you

ADD COMMENT

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6