Question: Any Possible Method/Tool That Could Refers The Number Of Overlapping Sequences Used To Build Each Contig?
gravatar for bambus0725
5.8 years ago by
bambus072550 wrote:


I work with Metatranscriptomics data(sequenced using Illumina technolgy).I did de-novo transcriptome assembly using SOAP-Denovo-Trans assembler and now looking for a tool/software that could help me out to find the total number of overlapped reads involved to form a single contig to understand how good or bad the coverage is.

Any suggestions could be helpful.

Thank you in advance.

ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 5.8 years ago by bambus072550
gravatar for Alex Reynolds
5.8 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

If you have your contigs and reads in sorted BED format (e.g., myContigs.bed and myReads.bed) and you know your overlap criteria, then you could use BEDOPS bedmap to answer that question:

$ bedmap --echo --count <overlap-options> myContigs.bed myReads.bed > myContigsWithCountOfOverlappingReads.bed

If you leave out <overlap-options> then the default overlap between read and contig is one base. Otherwise, you can specify number of bases of overlap between files with --bp-ovr or require a fraction of contig or read length with --fraction-ref and --fraction-map respectively. Other overlap options are also available. This is discussed more fully in the BEDOPS documentation.

ADD COMMENTlink written 5.8 years ago by Alex Reynolds29k

Thank you for the comment Alex.

the problem is that the data I work with is in Fasta format,is there an option to convert fasta file to the format that could be acceptable by BEDOPS like BAM/SAM.Does it works?

ADD REPLYlink written 5.8 years ago by bambus072550

It will depend on your Fasta file and whether it already contains coordinate and chromosome information (in the header, for instance). If not, you'll need to align your sequences to a reference genome to turn into BAM, SAM or PSL, and convert from there into BED with a conversion script (such as those in BEDOPS).

ADD REPLYlink written 5.8 years ago by Alex Reynolds29k

No actually the header line has only the sequence ID,and for most of the organisms doesn't exist any reference genome yet so I guess I can't do, even this is the reason I did de-novo transcriptome assembly

ADD REPLYlink written 5.8 years ago by bambus072550
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2160 users visited in the last hour