Selective assembly of Genome
2
0
Entering edit mode
8.6 years ago
kshitijtayal ▴ 40

I am interested in assembling a part of the genome, not the whole genome . I have the read file for the whole genome. Is there any possible way by which i can get a subset of reads for the part of the genome that i am interested in and then i can apply assembly over that subset of reads.

I am newbie to this field . Can you point me out to the literature where people have done this thing.

Assembly genome alignment sequence • 2.1k views
ADD COMMENT
1
Entering edit mode
8.6 years ago
Phil S. ▴ 700

If you have a reference genome you can map your reads to this reference genome (this can also be a close relative of your organism). Then you can filter out the reads by mapping position to obtain those which cover your region of interest...

HTH

ADD COMMENT
0
Entering edit mode

I also thought of the same.But then we would be unable to capture the diversity in the unknown genome region. In fact we filter out the reads by mapping position for region in the reference genome and then assemble that subset of reads, we will get back the region of reference genome itself. How to avoid this?

ADD REPLY
0
Entering edit mode

You could lower the mapping stringency and also use multiple mapped reads also covering this region.

ADD REPLY
0
Entering edit mode

I can lower the mapping stringency but is there any way to quantify that.?How much lower ? Because by lowering i will get more and more number of reads.

ADD REPLY
0
Entering edit mode

maybe this gives you an idea however i guess it is, unfortunatelly, trail and error... sry about that

ADD REPLY
0
Entering edit mode

You can also, rather than mapping, use kmer-matching. This can be more sensitive, depending on the parameters. For example:

bbduk.sh in=reads.fq ref=region.fa outm=matching.fq k=27 hdist=1

This will capture all the reads that share a 27-mer with the region, allowing one mismatch.

ADD REPLY
0
Entering edit mode

Thanks for the reply. Can you elaborate on bbduk.sh?

ADD REPLY
0
Entering edit mode

I've described it here. You can run the script with no parameters (or edit it) to get a list of parameters and their meanings.

Essentially, in this mode, it will retain every read that has a 27-mer match to the reference. You can also use the flag mkf=0.5, for example, which stands for "min kmer fraction", to require reads to share at least 50% of their kmers with the reference.

ADD REPLY
1
Entering edit mode
8.6 years ago
5heikki 11k

Based on literature, lots of groups have done such stuff with PRICE.

ADD COMMENT
0
Entering edit mode

Thanks. Can you point me out the title or link to literature papers that talk about this.?

ADD REPLY
0
Entering edit mode

You could start with the publications that cite the paper. There are also example cases in the paper itself.

ADD REPLY
0
Entering edit mode

I was going through the PRICE paper. Its more about assembling a particular gene from Meta genomic Sequence Data rather than assembling a part of the genome

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6