Question: Making a custom reference genome
1
gravatar for biogirl
4.1 years ago by
biogirl170
European Union
biogirl170 wrote:

Hi all,

 

For certain reasons, I need to make a custom reference genome to align my Illumina sequence reads against, and call SNPs.  I've made my custom reference by using SPAdes to create de novo assembly from sequence reads, and indexing for the programs I'm going to use for alignment and subsequent SNP calling (so indexed for BWA, GATK and Samtools).

I only want to call SNPs, so I'm not interested in gene annotation.  Basically, what I want to know is, am I missing anything crucial in the approach to making a custom reference sequence?

Thanks

snp sequence genome • 2.4k views
ADD COMMENTlink modified 4.1 years ago by h.mon25k • written 4.1 years ago by biogirl170

When you map reads back to your assembly, how the alignment stats looks ?

ADD REPLYlink written 4.1 years ago by geek_y9.7k

Well, according to flagstat, alignment stats look alright: 96% of reads mapped, and 94% properly paired, and low percentage (0.9%) of singletons.  

 

ADD REPLYlink written 4.1 years ago by biogirl170

How about multi mapped reads ?

ADD REPLYlink written 4.1 years ago by geek_y9.7k

Not really sure - how do I check?

ADD REPLYlink written 4.1 years ago by biogirl170
2
gravatar for h.mon
4.1 years ago by
h.mon25k
Brazil
h.mon25k wrote:

There is not a lot of detail to provide a good answer, I think. You did not detail how you assembled your genome, and if / how you evaluated your assembly for completeness and correctness, or contamination. What kind of organism, number of samples? Are you assembling the genome with only one strain, or several? Assembly quality will depend on many factors, if you are not aware of them you may be calling SNPs on artifacts.

 

ADD COMMENTlink written 4.1 years ago by h.mon25k

I think 'biogirl' is not bothered about assembly stats or anything. Just want to makes  representative sequences for a bunch of sequences, then align them back and call snps. 

ADD REPLYlink written 4.1 years ago by geek_y9.7k

Geek_y is correct - just interested in a representative sequence.  But to provide more detail, if it'll help, I assembled my genome using SPAdes using paired end reads from one strain.  I appreciate that paired-end reads are the best to use for de novo, but it's all I have at the moment.

ADD REPLYlink written 4.1 years ago by biogirl170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 669 users visited in the last hour