Question: Reference based assembly
1
gravatar for deepti1rao
6 weeks ago by
deepti1rao20
deepti1rao20 wrote:

I'm trying to do a reference-aided assembly of a new variety of rice genome. I have mapped my Illumina reads to the reference and replaced the uncovered bases with Ns. I have now used this masked genome fasta file as a reference to map my reads once again. I wish to pull out the variants by doing so and replacing them in the masked fasta file to generate my assembly. I have 96-97% of reads mapping to my reference. Is this a good strategy? I'm a bit in doubt, because I think that the Ns in the masked genome may cause errors in mapping.

Alternatively, shall I extract variants from the bam files that I got by mapping reads to the original (reference) genome file and have them replaced in the masked genome , only if the masked genome, does not have an N at that position?? If yes, then how should I go about this? I have made a bed file of the uncovered loci.

reference assembly • 180 views
ADD COMMENTlink modified 14 days ago by jean.elbers450 • written 6 weeks ago by deepti1rao20
0
gravatar for h.mon
5 weeks ago by
h.mon21k
Brazil
h.mon21k wrote:

I have 96-97% of reads mapping to my reference.

It seems the reference assembly is already good enough and pretty similar to your newly sequenced strain, so I wonder why do you need to perform the reference-based assembly. Regardless of the strategy you take, the reference will be of higher quality than your assembly, and by performing reference-based assembly, you may introduce artifacts into your reference.

ADD COMMENTlink written 5 weeks ago by h.mon21k
0
gravatar for ropolocan
15 days ago by
ropolocan460
Canada
ropolocan460 wrote:

There are other approaches that I might suggest instead of mapping reads to a reference genome. You can start with de novo assembly, and then scaffold your assembly with the help of a reference genome. For example, ragout does use reference genomes for scaffolding, but it can use multiple references. Doing that can account for structural variation present among different genomes.

ADD COMMENTlink modified 15 days ago • written 15 days ago by ropolocan460
0
gravatar for jean.elbers
14 days ago by
jean.elbers450
jean.elbers450 wrote:

You might consider Reference-guided de novo assembly (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1911-6 ). Before trying though, you should read the paper and look at the improvement for each assembler in de novo versus reference-guided de novo mode. There is a convenient script for each assembler on BitBucket (https://bitbucket.org/HeidiLischer/refguideddenovoassembly_pipelines ), but the scripts do not support starting and stopping at specific steps and also do not use gzipped FASTQs (so if you have limited hard drive space, you would need to modify). The scripts also do not delete temporary files (again a problem if you have a limited storage). Finally, if you only have short-insert libraries and no mate-pair libraries, then I don't think this approach will be a substantial improvement from de novo.

ADD COMMENTlink modified 14 days ago • written 14 days ago by jean.elbers450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1532 users visited in the last hour