Question: Aligning BAC to an assembled sequence
0
gravatar for User000
3.4 years ago by
User000260
User000260 wrote:

Hello,

I have several BAC sequences each of which contains 2-7 contigs. I need to align these BACs to a pseudomolecule and scaffolds. Basically, I need to identify a region were BAC maps on the pseudomolecule. At the end I need to get statistics such as alignment identity, aln lenth, coverage, start end. I can get all of these information from BLAST megablast. However, since megablast is local, it results in many HSP's obviously. Do you guys have an idea how can I calculate this and find the region on a pseudomolecule where each of my BAC aligns? if yes, then how? I can use bwa as an alternative, however, I dont know how to extract information from .bam file..

blast alignment • 1.6k views
ADD COMMENTlink modified 3.4 years ago by Darked894.2k • written 3.4 years ago by User000260
1

did you try BLAT instead of BLAST ?

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Pierre Lindenbaum116k

I don't know if it solves my problem..the output is more or less similar to what I get with megablast...Do you think taking the best hit solves the problem?

ADD REPLYlink written 3.4 years ago by User000260

in the ucsc, the BAC end sequences are "placed on the assembled sequence using BLAT".

ADD REPLYlink written 3.4 years ago by Pierre Lindenbaum116k

Yes but they use only the ends not the full sequence and they have a scoring system to filter hits.

ADD REPLYlink written 3.4 years ago by Jean-Karim Heriche18k
1
gravatar for Darked89
3.4 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

You may try to use LAST: http://last.cbrc.jp/

First create database from your assembly, then query it with your BAC fasta. You will get MAF output you can parse.

Caveat: repetitive contigs may be dropped from the output. But then it may be the correct behavior: Imagine that your BAC insert in reality is 100kb long and maps to scaffold/chromosome  from assembly  starting from position 1, with the last 10k (90-100kb BC & genomic) is full of repeats. On the top of it the next 50k of your assembly scaffold  is also repetitive and full of stretches of Ns denoting gaps. You do not want to come to conclusion that your BAC is 150kb because something from BAC maps to these parts of scaffold. 

 

 

 

 

ADD COMMENTlink written 3.4 years ago by Darked894.2k

thank you very much. I have tried it, it gives me a lot of hits as well as BLAT, how can I decide which region to choose? the best hit?

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by User000260
There are two LAST utilities: maf-swap and maf-cull. What you can do is to swap top (scaffolds from db) sequence with a contigs seq from your BAC(s). Then cull all the hits except the top one. If you are into assembly checking/scaffold building, look also in LAST as a short read mapper (map reads to assembly). And in converting MAF output to SAM/BAM viewable in ie IGV. I will provide the link for this a bit later.
ADD REPLYlink written 3.4 years ago by Darked894.2k
1

Here we go (Shameless self-plug mode on):

http://openwetware.org/wiki/Wikiomics:WinterSchool_day2#last

One can do this MAF to BAM coversion not just with mapped short reads but also with contigs/BAC sequences or other genomes. And verify some tricky spots by eye if necessary.

ADD REPLYlink written 3.4 years ago by Darked894.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1602 users visited in the last hour