Question

Selective assembly of Genome

0

Entering edit mode

8.6 years ago

kshitijtayal ▴ 40

I am interested in assembling a part of the genome, not the whole genome . I have the read file for the whole genome. Is there any possible way by which i can get a subset of reads for the part of the genome that i am interested in and then i can apply assembly over that subset of reads.

I am newbie to this field . Can you point me out to the literature where people have done this thing.

Assembly genome alignment sequence • 2.1k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by kshitijtayal ▴ 40

Ram · Answer 1 · 2015-09-18

1

Entering edit mode

8.6 years ago

Phil S. ▴ 700

If you have a reference genome you can map your reads to this reference genome (this can also be a close relative of your organism). Then you can filter out the reads by mapping position to obtain those which cover your region of interest...

HTH

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Phil S. ▴ 700

0

Entering edit mode

I also thought of the same.But then we would be unable to capture the diversity in the unknown genome region. In fact we filter out the reads by mapping position for region in the reference genome and then assemble that subset of reads, we will get back the region of reference genome itself. How to avoid this?

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by kshitijtayal ▴ 40

0

Entering edit mode

You could lower the mapping stringency and also use multiple mapped reads also covering this region.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Phil S. ▴ 700

0

Entering edit mode

I can lower the mapping stringency but is there any way to quantify that.?How much lower ? Because by lowering i will get more and more number of reads.

ADD REPLY • link 8.6 years ago by kshitijtayal ▴ 40

0

Entering edit mode

maybe this gives you an idea however i guess it is, unfortunatelly, trail and error... sry about that

ADD REPLY • link 8.6 years ago by Phil S. ▴ 700

0

Entering edit mode

You can also, rather than mapping, use kmer-matching. This can be more sensitive, depending on the parameters. For example:

bbduk.sh in=reads.fq ref=region.fa outm=matching.fq k=27 hdist=1

This will capture all the reads that share a 27-mer with the region, allowing one mismatch.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks for the reply. Can you elaborate on bbduk.sh?

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by kshitijtayal ▴ 40

0

Entering edit mode

I've described it here. You can run the script with no parameters (or edit it) to get a list of parameters and their meanings.

Essentially, in this mode, it will retain every read that has a 27-mer match to the reference. You can also use the flag mkf=0.5, for example, which stands for "min kmer fraction", to require reads to share at least 50% of their kmers with the reference.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Brian Bushnell 20k

Ram · Answer 2 · 2015-09-18

1

Entering edit mode

8.6 years ago

5heikki 11k

Based on literature, lots of groups have done such stuff with PRICE.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by 5heikki 11k

0

Entering edit mode

Thanks. Can you point me out the title or link to literature papers that talk about this.?

ADD REPLY • link 8.6 years ago by kshitijtayal ▴ 40

0

Entering edit mode

You could start with the publications that cite the paper. There are also example cases in the paper itself.

ADD REPLY • link 8.6 years ago by 5heikki 11k

0

Entering edit mode

I was going through the PRICE paper. Its more about assembling a particular gene from Meta genomic Sequence Data rather than assembling a part of the genome

ADD REPLY • link 8.6 years ago by kshitijtayal ▴ 40