Question: How to generate contigs from bam alignment
1
gravatar for goubert.clement
5.2 years ago by
France
goubert.clement10 wrote:

Hi all,

I am currently trying to perform a new assembly using paired and mate paired reads. I first mapped my paired-end on a reference genome. Now I would like to extract from the mapping only the covered portions and get contigs from my reads (not a consensus with the reference genome including N between the uncovered portions). I would like then to use the paired-end reads and the mate paired to scaffold my assembly.

I there a proper way to do that ? I think maybe mpileup could help, but until now I only managed to produced consensus.

Extra question, which software would you recommend to perform the scaffolding ? Does any one exist allowing different insert size ?

Thank you very much !!!

Clément

ADD COMMENTlink modified 5.2 years ago by Manu Prestat4.0k • written 5.2 years ago by goubert.clement10
2
gravatar for thackl
5.2 years ago by
thackl2.8k
MIT
thackl2.8k wrote:

Here is an approach that might do what you want, I've tested it with some random data - In the first panel are mappings of some simulated reads with some SNPs, below are the mappings of the two generated contigs.

To generate the consensus contigs I used two modules from proovread, a correction software I have written. You will need to clone/install it from github and you will need a recent version of samtools for it to run.

# generate consensus (ignoring reference)
bam2cns --bam reads-to-ref.bam --ref ref.fa --prefix cns.fq

# trim no-coverage regions and convert to fasta
SeqFilter --in cns.fq --out cns.trimmed.fa \
 --fasta --min-length 100 \
 --phred-offset 33 --trim-win 1,1
ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by thackl2.8k

Thanks a lot !! I will try your method as soon as possible and let you know how it wored ! It seems to do excactly what I want ! 

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by goubert.clement10

I'm almost done, however, I had this error at the second step :

Can't locate Verbose.pm in @INC (you may need to install the Verbose module) [...]
BEGIN failed--compilation aborted at ~/proovread/bin/SeqFilter line 13.

Do I need the Verbose module, if yes, where could I find it ? I am not familiar with the those tools.

Many thanks !

ADD REPLYlink written 5.2 years ago by goubert.clement10
1

Verbose.pm comes with proovread, use

export PERL5LIB=/path/to/proovread/lib:$PERL5LIB

to make it visible to SeqFilter

ADD REPLYlink written 5.2 years ago by thackl2.8k

Thanks! It is running!

ADD REPLYlink written 5.2 years ago by goubert.clement10

How would this approach deal with indels?

ADD REPLYlink written 12 months ago by kfletcher10

They should look like in the reads, not like in the reference.

ADD REPLYlink written 12 months ago by thackl2.8k
1
gravatar for thackl
5.2 years ago by
thackl2.8k
MIT
thackl2.8k wrote:

If your goal is to reassembly mapped reads, you can use something like

samtools view -bf <mapped+other criteria=""> your.bam | 
 samtools bam2fq /dev/fd/0 > mapped_reads.fq

and than simply use an assembler to reassemble the fastq reads or provide the reads to a scaffolder

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by thackl2.8k

This command only return the reads in alignment right in fq ? Because my goal is actually to use the mapping info to avoid a full de novo assembly. I would like to know if I simply can have the linear consensus from my reads regarding their position in the alignment, and if there is gaps, have different contig instead of NNNN. Thanks !

ADD REPLYlink written 5.2 years ago by goubert.clement10

From your description above, I thought you did not want to produce a consensus., For consensus have a look at  Ngs Question ~ Consensus for example. There also other threads regarding the matter. Just search for bam+consensus.

ADD REPLYlink written 5.2 years ago by thackl2.8k

Well, thanks for your reply. I had a look around, however, I think I still don't find what I really want to do... :S. 

I do not want the consensus to be regarding my reads + the ref genome, but to be only a "fasta projection" of my reads, ordered thanks to the reference genome. I have for examples some areas of the ref chromosome that are not covered, so I expect to have multiple contigs of the chromosome, instead of one chromosome but with NNN filling the gaps... I may be not clear,... I apology !! Thanks again !

ADD REPLYlink written 5.2 years ago by goubert.clement10

so what you have is kind of this:

pile1          pile2
-- --          -- --
 -- --       -- -- --       reads
-- -- --    -- --   --
___________________________ ref

and what you want is:

pile1
-- --
 -- --
-- -- --
________ < pile1 consensus

  pile2
  -- --
 -- -- --       reads
-- --   --
_______________ < pile2 consensus

??

ADD REPLYlink written 5.2 years ago by thackl2.8k

Exactly !!!

ADD REPLYlink written 5.2 years ago by goubert.clement10

Hi thackl, is possible with SeqFilter do this?

pile1 pile2 -- -- -- -- -- -- -- -- -- reads -- -- -- -- -- -- ___________________________ ref

pile1 pile2 -- -- -- -- -- -- -- -- -- reads -- -- -- -- -- -- _____N______NNNNNNNNNNN ref

Thanks!!

EDIT: I got it! thanks

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by jolobito0
0
gravatar for goubert.clement
5.2 years ago by
France
goubert.clement10 wrote:

After a first try, it seems to do exactly what I need ! Thank you very much for your help and support thackl !

ADD COMMENTlink written 5.2 years ago by goubert.clement10

You are welcome!

ADD REPLYlink written 5.2 years ago by thackl2.8k

I would also like to try this with proovread but installation fails at the "make install" step and the instructions are confusing a bit. Am I supposed to type exactly: "make install PREFIX=/Users/GBLock/src/proovread" ?

ADD REPLYlink written 12 months ago by DNAngel70

You can use PREFIX=/path/to/where/you/want/proovread/to/be/installed make install if you want to move proovread to a certain location. You don't need to do that, though. It will also just run without make install from the directory you downloaded it to.

ADD REPLYlink written 12 months ago by thackl2.8k
0
gravatar for Manu Prestat
5.2 years ago by
Manu Prestat4.0k
Lyon, France
Manu Prestat4.0k wrote:

What about a mapper / assembler which uses a reference sequence ? e.g. MAQ...

or one of the many "reference assisted" assemblers? 

ADD COMMENTlink written 5.2 years ago by Manu Prestat4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 992 users visited in the last hour