Question: Extending ends of sequences with the help of reads?
3
gravatar for manekineko
2.3 years ago by
manekineko90
Bulgaria
manekineko90 wrote:

Hi,

I have some hundreds of sequences and library of reads, How is possible to extend these sequences with the reads from the both end of the sequences? But I do not want the software to merge the sequences as they are separate things, just to extend them with the help of the reads as much as possible?

assembly • 1.6k views
ADD COMMENTlink modified 2.3 years ago by Brian Bushnell14k • written 2.3 years ago by manekineko90

What sort of format would you like the output to be? I assume you're using SAM or BAM as input.

ADD REPLYlink written 2.3 years ago by Devon Ryan70k

I have the input sequences want to extend FASTA, reads in FASTA.

(I also have the sequences GFF and the reads mapped in BAM if needed).

 

I would like to have the extended sequences fasta and/or GFF.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by manekineko90

Ah, now I understand what you're trying to do. Stated somewhat more clearly, you have incomplete contigs (presumably from another attempt at assembly) and want to extend them given an alignment to them. Perhaps you also want to do some scaffolding but that's unclear. If that's the case, you might mention that and one of the more assembly-experienced folks here can chime in.

ADD REPLYlink written 2.3 years ago by Devon Ryan70k

A bit more different -  trying to assemble an mRNA transcripts as I have dome part of each one (somewhare in the middle of it) and have a lots of reads

ADD REPLYlink written 2.3 years ago by manekineko90
9
gravatar for Brian Bushnell
2.3 years ago by
Walnut Creek, USA
Brian Bushnell14k wrote:

I recently developed an assembler, Tadpole, for this problem.  It's part of the BBMap package.  Usage:

tadpole.sh in=sequences.fa extra=reads.fq out=extended.fa extendleft=200 extendright=200 ibb=f mode=extend

It will extend the sequences using kmer counts.  You can set the length to extend as long as you want, but that's just an upper limit; it will stop earlier if it hits a branch.

(edit: modified command line to include "mode=extend" which I had forgotten)

(edit2: modified for updated syntax in v37.33+)

ADD COMMENTlink modified 3 months ago • written 2.3 years ago by Brian Bushnell14k

Hi, seems what I need, is it gona working if my reads are not fastq but fasta? And if each sequence can be extended differently depending on reads, how to figure out the extendleft and extendright values?

ADD REPLYlink written 2.3 years ago by manekineko90
1

It accepts fasta and fastq.  And you can just set the extendleft and extendright numbers to something high like 2000 if you want.  They will only be extended until a branch is encountered in the DeBruijn graph, which depends on the organism.  So, if you set them to 10, almost all the sequences will be extended by 10bp.  If you set it to 1 million, none of them will be extended to one million - they'll all stop somewhere before that, since the extension will only continue as long as there is a single unambiguous best path (according to the thresholds you set).  Therefore, just set them to a number X such that you don't want anything to extend more than X.
 

ADD REPLYlink written 2.3 years ago by Brian Bushnell14k

Hi all, I need to clarify this. The syntax has changed a little for extending existing contigs using new reads... I'll update it later today.

ADD REPLYlink written 3 months ago by Brian Bushnell14k
3
gravatar for Pierre Lindenbaum
2.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum98k wrote:

Interesting challenge , I just wrote a tool ( https://github.com/lindenb/jvarkit/wiki/ExtendReferenceWithReads ) that takes as input

  • an indexed fasta REFERENCE sequence
  • some indexed BAM

 

the (clipped) overlapping reads are used to extend the REF sequence in 5', 3' and in the contigs containing 'N'.

$  java   -jar dist/extendrefwithreads.jar \
     -R human_g1k_v37.fasta -f 0.3 \
     f1.bam f2.bam f3.bam 2> /dev/null |\
  cat -n | grep -E '(>|[atgc])' 

     1  >1
   167  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNncgattaccctaacgctcac
   168  cctaaccctcnccctntnccnncnncccnncttcttccgaTAACCCTAACCCTAACCCTA
  3791  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNatt
  3792  tatgcNctttntgctgtGATTCATGGCTGAAATCGTGTTTGACCAGCTATGTGTGTCTCT
  8691  NNNNNNNNNNNNNNNNNNNNNNNNctagGATCCTTGAAGCGCCCCCAAGGGCATCTTCTC
 64089  TGGTGAGGGAAATTAGAACCACGACAATTTGGGAACTTAGCTTCTGCCctgctccNNNNN
 66589  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNgagtAGCTGAGACTAC

 

 

 

 

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Pierre Lindenbaum98k
1
gravatar for h.mon
2.3 years ago by
h.mon8.7k
Brazil
h.mon8.7k wrote:

You may also thy Mapsembler2.

ADD COMMENTlink written 2.3 years ago by h.mon8.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour