Selective assembly of Genome
2
0
Entering edit mode
7.4 years ago
kshitijtayal ▴ 40

I am interested in assembling a part of the genome, not the whole genome . I have the read file for the whole genome. Is there any possible way by which i can get a subset of reads for the part of the genome that i am interested in and then i can apply assembly over that subset of reads.

I am newbie to this field . Can you point me out to the literature where people have done this thing.

Assembly genome alignment sequence • 1.6k views
1
Entering edit mode
7.4 years ago
Phil S. ▴ 700

If you have a reference genome you can map your reads to this reference genome (this can also be a close relative of your organism). Then you can filter out the reads by mapping position to obtain those which cover your region of interest...

HTH

0
Entering edit mode

I also thought of the same.But then we would be unable to capture the diversity in the unknown genome region. In fact we filter out the reads by mapping position for region in the reference genome and then assemble that subset of reads, we will get back the region of reference genome itself. How to avoid this?

0
Entering edit mode

You could lower the mapping stringency and also use multiple mapped reads also covering this region.

0
Entering edit mode

I can lower the mapping stringency but is there any way to quantify that.?How much lower ? Because by lowering i will get more and more number of reads.

0
Entering edit mode

maybe this gives you an idea however i guess it is, unfortunatelly, trail and error... sry about that

0
Entering edit mode

You can also, rather than mapping, use kmer-matching. This can be more sensitive, depending on the parameters. For example:

bbduk.sh in=reads.fq ref=region.fa outm=matching.fq k=27 hdist=1


This will capture all the reads that share a 27-mer with the region, allowing one mismatch.

0
Entering edit mode

Thanks for the reply. Can you elaborate on bbduk.sh?

0
Entering edit mode

I've described it here. You can run the script with no parameters (or edit it) to get a list of parameters and their meanings.

Essentially, in this mode, it will retain every read that has a 27-mer match to the reference. You can also use the flag mkf=0.5, for example, which stands for "min kmer fraction", to require reads to share at least 50% of their kmers with the reference.

1
Entering edit mode
7.4 years ago
5heikki 11k

Based on literature, lots of groups have done such stuff with PRICE.

0
Entering edit mode

0
Entering edit mode

You could start with the publications that cite the paper. There are also example cases in the paper itself.

0
Entering edit mode

I was going through the PRICE paper. Its more about assembling a particular gene from Meta genomic Sequence Data rather than assembling a part of the genome