Hi everyone, I am trying to build De Bruijn graph from short reads. I have some reads that has length < 10 (0.01 % only). I am just worried if those reads (however very small % of them) will create problem for graph building?
The stats i am getting for graph building is :
bank bank_uri : SRR2847385_interleaved.fasta,SRR2847386_interleaved.fasta bank_size : 117637467610 bank_total_nt : 89977612926 sequences seq_number : 455322542 seq_size_min : 1 seq_size_max : 250 seq_size_mean : 197.6 seq_size_deviation : 55.4 kmers kmers_nb_valid : 80415655424 kmers_nb_invalid : 3772037 stats histogram cutoff : 23 nb_ge_cutoff : 332423627 first_peak : 91 kmers solidity_kind : sum thresholds : 3 3 kmers_nb_distinct : 931511370 kmers_nb_solid : 452752530 kmers_nb_weak : 478758840 kmers_percent_weak : 51.4
As you can see large number of them are valid k-mers. Do you think the graph just ignore reads below a certain length?
Thanks in advance. Faraz.