How to generate contigs from bam alignment
4
1
Entering edit mode
7.4 years ago

Hi all,

I am currently trying to perform a new assembly using paired and mate paired reads. I first mapped my paired-end on a reference genome. Now I would like to extract from the mapping only the covered portions and get contigs from my reads (not a consensus with the reference genome including N between the uncovered portions). I would like then to use the paired-end reads and the mate paired to scaffold my assembly.

I there a proper way to do that ? I think maybe mpileup could help, but until now I only managed to produced consensus.

Extra question, which software would you recommend to perform the scaffolding ? Does any one exist allowing different insert size ?

Thank you very much !!!

Clément

Assembly .bam contigs genome guided • 7.2k views
2
Entering edit mode
7.4 years ago
thackl ★ 3.0k

Here is an approach that might do what you want, I've tested it with some random data - In the first panel are mappings of some simulated reads with some SNPs, below are the mappings of the two generated contigs.

To generate the consensus contigs I used two modules from proovread, a correction software I have written. You will need to clone/install it from github and you will need a recent version of samtools for it to run.

# generate consensus (ignoring reference)
bam2cns --bam reads-to-ref.bam --ref ref.fa --prefix cns.fq

# trim no-coverage regions and convert to fasta
SeqFilter --in cns.fq --out cns.trimmed.fa \
--fasta --min-length 100 \
--phred-offset 33 --trim-win 1,1

0
Entering edit mode

Thanks a lot! I will try your method as soon as possible and let you know how it worked! It seems to do exactly what I want!

0
Entering edit mode

I'm almost done, however, I had this error at the second step:

Can't locate Verbose.pm in @INC (you may need to install the Verbose module) [...]
BEGIN failed--compilation aborted at ~/proovread/bin/SeqFilter line 13.


Do I need the Verbose module, if yes, where could I find it ? I am not familiar with the those tools.

Many thanks!

1
Entering edit mode

export PERL5LIB=/path/to/proovread/lib:\$PERL5LIB

to make it visible to SeqFilter

0
Entering edit mode

Thanks! It is running!

0
Entering edit mode

How would this approach deal with indels?

0
Entering edit mode

They should look like in the reads, not like in the reference.

1
Entering edit mode
7.4 years ago
thackl ★ 3.0k

If your goal is to reassembly mapped reads, you can use something like

samtools view -bf <mapped+other criteria> your.bam |


and than simply use an assembler to reassemble the fastq reads or provide the reads to a scaffolder

0
Entering edit mode

This command only return the reads in alignment right in fq? Because my goal is actually to use the mapping info to avoid a full de novo assembly. I would like to know if I simply can have the linear consensus from my reads regarding their position in the alignment, and if there is gaps, have different contig instead of NNNN. Thanks!

0
Entering edit mode

From your description above, I thought you did not want to produce a consensus., For consensus have a look at Ngs Question ~ Consensus for example. There also other threads regarding the matter. Just search for bam+consensus.

0
Entering edit mode

Well, thanks for your reply. I had a look around, however, I think I still don't find what I really want to do... :S.

I do not want the consensus to be regarding my reads + the ref genome, but to be only a "fasta projection" of my reads, ordered thanks to the reference genome. I have for examples some areas of the ref chromosome that are not covered, so I expect to have multiple contigs of the chromosome, instead of one chromosome but with NNN filling the gaps... I may be not clear,... I apology !! Thanks again!

0
Entering edit mode

So what you have is kind of this:

pile1          pile2
-- --          -- --
-- --       -- -- --       reads
-- -- --    -- --   --
___________________________ ref


and what you want is:

pile1
-- --
-- --
-- -- --
________ < pile1 consensus

pile2
-- --
-- --   --
_______________ < pile2 consensus


?

0
Entering edit mode

Exactly !!!

0
Entering edit mode

Hi thackl, is possible with SeqFilter do this?

pile1          pile2
-- --          -- --
-- --       -- -- --       reads
-- -- --    -- --   --
___________________________ ref

pile1          pile2
-- --          -- --
-- --       -- -- --       reads
-- -- --    -- --   --
_____N______NNNNNNNNNNN ref


Thanks!

EDIT: I got it! thanks

0
Entering edit mode
7.4 years ago

After a first try, it seems to do exactly what I need ! Thank you very much for your help and support thackl !

0
Entering edit mode

You are welcome!

0
Entering edit mode

I would also like to try this with proovread but installation fails at the "make install" step and the instructions are confusing a bit. Am I supposed to type exactly: "make install PREFIX=/Users/GBLock/src/proovread" ?

0
Entering edit mode

You can use PREFIX=/path/to/where/you/want/proovread/to/be/installed make install if you want to move proovread to a certain location. You don't need to do that, though. It will also just run without make install from the directory you downloaded it to.

0
Entering edit mode
7.4 years ago

What about a mapper / assembler which uses a reference sequence? e.g. MAQ...

or one of the many "reference assisted" assemblers?