Question: Reference based assembly
1
gravatar for deepti1rao
5 months ago by
deepti1rao20
deepti1rao20 wrote:

I'm trying to do a reference-aided assembly of a new variety of rice genome. I have mapped my Illumina reads to the reference and replaced the uncovered bases with Ns. I have now used this masked genome fasta file as a reference to map my reads once again. I wish to pull out the variants by doing so and replacing them in the masked fasta file to generate my assembly. I have 96-97% of reads mapping to my reference. Is this a good strategy? I'm a bit in doubt, because I think that the Ns in the masked genome may cause errors in mapping.

Alternatively, shall I extract variants from the bam files that I got by mapping reads to the original (reference) genome file and have them replaced in the masked genome , only if the masked genome, does not have an N at that position?? If yes, then how should I go about this? I have made a bed file of the uncovered loci.

reference assembly • 265 views
ADD COMMENTlink modified 4 months ago by jean.elbers720 • written 5 months ago by deepti1rao20
0
gravatar for h.mon
5 months ago by
h.mon23k
Brazil
h.mon23k wrote:

I have 96-97% of reads mapping to my reference.

It seems the reference assembly is already good enough and pretty similar to your newly sequenced strain, so I wonder why do you need to perform the reference-based assembly. Regardless of the strategy you take, the reference will be of higher quality than your assembly, and by performing reference-based assembly, you may introduce artifacts into your reference.

ADD COMMENTlink written 5 months ago by h.mon23k
0
gravatar for ropolocan
4 months ago by
ropolocan500
Canada
ropolocan500 wrote:

There are other approaches that I might suggest instead of mapping reads to a reference genome. You can start with de novo assembly, and then scaffold your assembly with the help of a reference genome. For example, ragout does use reference genomes for scaffolding, but it can use multiple references. Doing that can account for structural variation present among different genomes.

ADD COMMENTlink modified 4 months ago • written 4 months ago by ropolocan500
0
gravatar for jean.elbers
4 months ago by
jean.elbers720
jean.elbers720 wrote:

You might consider Reference-guided de novo assembly (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1911-6 ). Before trying though, you should read the paper and look at the improvement for each assembler in de novo versus reference-guided de novo mode. There is a convenient script for each assembler on BitBucket (https://bitbucket.org/HeidiLischer/refguideddenovoassembly_pipelines ), but the scripts do not support starting and stopping at specific steps and also do not use gzipped FASTQs (so if you have limited hard drive space, you would need to modify). The scripts also do not delete temporary files (again a problem if you have a limited storage). Finally, if you only have short-insert libraries and no mate-pair libraries, then I don't think this approach will be a substantial improvement from de novo.

ADD COMMENTlink modified 4 months ago • written 4 months ago by jean.elbers720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2156 users visited in the last hour