How to generate contigs from bam alignment
4
1
Entering edit mode
5.8 years ago

Hi all,

I am currently trying to perform a new assembly using paired and mate paired reads. I first mapped my paired-end on a reference genome. Now I would like to extract from the mapping only the covered portions and get contigs from my reads (not a consensus with the reference genome including N between the uncovered portions). I would like then to use the paired-end reads and the mate paired to scaffold my assembly.

I there a proper way to do that ? I think maybe mpileup could help, but until now I only managed to produced consensus.

Extra question, which software would you recommend to perform the scaffolding ? Does any one exist allowing different insert size ?

Thank you very much !!!

Clément

Assembly .bam contigs genome guided • 5.7k views
ADD COMMENT
2
Entering edit mode
5.8 years ago
thackl ★ 2.8k

Here is an approach that might do what you want, I've tested it with some random data - In the first panel are mappings of some simulated reads with some SNPs, below are the mappings of the two generated contigs.

To generate the consensus contigs I used two modules from proovread, a correction software I have written. You will need to clone/install it from github and you will need a recent version of samtools for it to run.

# generate consensus (ignoring reference)
bam2cns --bam reads-to-ref.bam --ref ref.fa --prefix cns.fq

# trim no-coverage regions and convert to fasta
SeqFilter --in cns.fq --out cns.trimmed.fa \
 --fasta --min-length 100 \
 --phred-offset 33 --trim-win 1,1
ADD COMMENT
0
Entering edit mode

Thanks a lot !! I will try your method as soon as possible and let you know how it wored ! It seems to do excactly what I want ! 

ADD REPLY
0
Entering edit mode

I'm almost done, however, I had this error at the second step :

Can't locate Verbose.pm in @INC (you may need to install the Verbose module) [...]
BEGIN failed--compilation aborted at ~/proovread/bin/SeqFilter line 13.

Do I need the Verbose module, if yes, where could I find it ? I am not familiar with the those tools.

Many thanks !

ADD REPLY
1
Entering edit mode

Verbose.pm comes with proovread, use

export PERL5LIB=/path/to/proovread/lib:$PERL5LIB

to make it visible to SeqFilter

ADD REPLY
0
Entering edit mode

Thanks! It is running!

ADD REPLY
0
Entering edit mode

How would this approach deal with indels?

ADD REPLY
0
Entering edit mode

They should look like in the reads, not like in the reference.

ADD REPLY
1
Entering edit mode
5.8 years ago
thackl ★ 2.8k

If your goal is to reassembly mapped reads, you can use something like

samtools view -bf <mapped+other criteria=""> your.bam | 
 samtools bam2fq /dev/fd/0 > mapped_reads.fq

and than simply use an assembler to reassemble the fastq reads or provide the reads to a scaffolder

ADD COMMENT
0
Entering edit mode

This command only return the reads in alignment right in fq ? Because my goal is actually to use the mapping info to avoid a full de novo assembly. I would like to know if I simply can have the linear consensus from my reads regarding their position in the alignment, and if there is gaps, have different contig instead of NNNN. Thanks !

ADD REPLY
0
Entering edit mode

From your description above, I thought you did not want to produce a consensus., For consensus have a look at  Ngs Question ~ Consensus for example. There also other threads regarding the matter. Just search for bam+consensus.

ADD REPLY
0
Entering edit mode

Well, thanks for your reply. I had a look around, however, I think I still don't find what I really want to do... :S. 

I do not want the consensus to be regarding my reads + the ref genome, but to be only a "fasta projection" of my reads, ordered thanks to the reference genome. I have for examples some areas of the ref chromosome that are not covered, so I expect to have multiple contigs of the chromosome, instead of one chromosome but with NNN filling the gaps... I may be not clear,... I apology !! Thanks again !

ADD REPLY
0
Entering edit mode

so what you have is kind of this:

pile1          pile2
-- --          -- --
 -- --       -- -- --       reads
-- -- --    -- --   --
___________________________ ref

and what you want is:

pile1
-- --
 -- --
-- -- --
________ < pile1 consensus

  pile2
  -- --
 -- -- --       reads
-- --   --
_______________ < pile2 consensus

??

ADD REPLY
0
Entering edit mode

Exactly !!!

ADD REPLY
0
Entering edit mode

Hi thackl, is possible with SeqFilter do this?

pile1 pile2 -- -- -- -- -- -- -- -- -- reads -- -- -- -- -- -- ___________________________ ref

pile1 pile2 -- -- -- -- -- -- -- -- -- reads -- -- -- -- -- -- _____N______NNNNNNNNNNN ref

Thanks!!

EDIT: I got it! thanks

ADD REPLY
0
Entering edit mode
5.8 years ago

After a first try, it seems to do exactly what I need ! Thank you very much for your help and support thackl !

ADD COMMENT
0
Entering edit mode

You are welcome!

ADD REPLY
0
Entering edit mode

I would also like to try this with proovread but installation fails at the "make install" step and the instructions are confusing a bit. Am I supposed to type exactly: "make install PREFIX=/Users/GBLock/src/proovread" ?

ADD REPLY
0
Entering edit mode

You can use PREFIX=/path/to/where/you/want/proovread/to/be/installed make install if you want to move proovread to a certain location. You don't need to do that, though. It will also just run without make install from the directory you downloaded it to.

ADD REPLY
0
Entering edit mode
5.8 years ago

What about a mapper / assembler which uses a reference sequence ? e.g. MAQ...

or one of the many "reference assisted" assemblers? 

ADD COMMENT

Login before adding your answer.

Traffic: 2481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6