Question: Generating consensus sequences without reference genome
gravatar for thenameisizaak
4.2 years ago by
thenameisizaak0 wrote:

Hello All,

I am currently writing some cancer genomics software for crunching on NGS data. I have extracted groups of reads where each read from each group shares a 30bp exact string (sequence) match, with every other read within that group. I also know the position of this match for each read. As such, with these two pieces of information, I already posses the information to align all the reads. I now want to build a consensus sequence, aligned at this position. Does anyone know of software (that I can call from a C++ program, either from inside the program, or externally via command line call) that will allow me to do this? I get the feeling that SamTools is commonly used, namely the program mpileup. However, from their example

samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcf2fq > cns.fq

It looks like the tool requires me to provide a reference genome to align the reads ref.fa. However, the aln.bam file should already contain the alignment information. So, I'm a little confused as to why ref.fa is needed. Does anyone know if this software can be used without a reference? I.e just provide the reads, align against themselves and output the consensus? It looks like I may have to generate a .bam from the reads to do this if I wish to use mpileup, but seeming as I already know the alignment, I may have to generate this artificially.

If anyone could shed some light about the best software to use, when trying to build consensus seq. with data where alignment information is readily acquirable that would be of great help!

Best, Izaak

samtools alignment conensus • 2.2k views
ADD COMMENTlink modified 4.2 years ago by guillaume.rbt800 • written 4.2 years ago by thenameisizaak0

Samtools by design is a reference-guided consensus tool. What you are looking for is a de novo assembler. Try spades, velvet, abyss, sga or fermi-lite. As a side note, you are really talking about two distinct types of alignments: read-to-reference alignment and read-to-read alignments. Make sure you understand their differences.

ADD REPLYlink written 4.2 years ago by lh332k
gravatar for guillaume.rbt
4.2 years ago by
guillaume.rbt800 wrote:

HI Izaak,

I had the same problematic.

As you did I first tried to get the consensus from the alignments. Unfortunately I haven't managed to find a tool that does this kind of thing. The problem is that sometimes it's difficult to tell which base is the alternate consensus (for example for heterozygous position).

The solution I found is to compute local assembly on subsample of reads in the regions I was interested in. For this I used minia assembler, which is easy to use. (~/soft/minia-2.0.3-Linux/bin/minia -in ../sample_86012.fq -kmer-size 31 -abundance-min 3 -out minia_assembly_k31_m3)

Hope this help

ADD COMMENTlink written 4.2 years ago by guillaume.rbt800
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 983 users visited in the last hour