Question: Extract sequence from fasta using primer
0
gravatar for sacha
13 months ago by
sacha820
France
sacha820 wrote:

I Have a fasta database containing 16 RNA. I Would like to extract sequences between 2 primers. Like a in sillico PCR but for all 16S RNA in my database. Which software, i m sure there is one ?

pcr primer extract fasta • 875 views
ADD COMMENTlink modified 13 months ago • written 13 months ago by sacha820

I would love to have something like that :

  seq_extract -forward ACTGAGA -reverse TCGAGAGA  database.fasta > extract.fasta

in C++ of course :D

ADD REPLYlink modified 13 months ago • written 13 months ago by sacha820

Why not use bbduk.sh from BBMap? You can provide the sequence as literal=left_primer,right_primer. Not C++ but probably of of the best java program there is.

ADD REPLYlink written 13 months ago by genomax37k

grep -B 1 ACTGAGA.*TCGAGAGA database.fasta > extract.fasta? (assuming you have linearized the fasta "database")

ADD REPLYlink modified 13 months ago • written 13 months ago by John12k
2
gravatar for sacha
13 months ago by
sacha820
France
sacha820 wrote:

The following method works pretty well :

cutadapt --discard-untrimmed -g $FORWARD $INPUT 2> /dev/null | cutadapt --discard-untrimmed -a $REVERSE - 2> /dev/null > $OUTPUT

FORWARD and REVERSE are sequence primers .

ADD COMMENTlink modified 13 months ago • written 13 months ago by sacha820
1
gravatar for Brian Bushnell
13 months ago by
Walnut Creek, USA
Brian Bushnell14k wrote:

To quote myself:

I also wrote another pair of programs specifically for working with primer pairs, msa.sh and cutprimers.sh. msa.sh will forcibly align a primer sequence (or a set of primer sequences) against a set of reference sequences to find the single best matching location per reference sequence - in other words, if you have 3 primers and 100 ref sequences, it will output a sam file with exactly 100 alignments - one per ref sequence, using the primer sequence that matched best. Of course you can also just run it with 1 primer sequence.

So you run msa twice - once for the left primer, and once for the right primer - and generate 2 sam files. Then you feed those into cutprimers.sh, which will create a new fasta file containing the sequence between the primers, for each reference sequence. We used these programs to synthetically cut V4 out of full-length 16S sequences (PacBio amplicons).

These are both in the BBMap package and tolerant of indels, as required due to PacBio's error profile. If you want to only use exact matches, you might need a different approach.

ADD COMMENTlink modified 13 months ago • written 13 months ago by Brian Bushnell14k

Is this more efficient than using bbduk?

ADD REPLYlink written 13 months ago by genomax37k

Computationally, it's much less efficient than BBDuk, for a small edit distance. But it works regardless of the orientation of the sequence. With BBDuk you'd essentially need to do a left-trim with one adapter, then a right-trim with the other adapter... which would only work for the sequences oriented as you expect. I'm not sure if 16S repositories all have the same orientation.

ADD REPLYlink written 13 months ago by Brian Bushnell14k
0
gravatar for shenwei356
13 months ago by
shenwei3563.4k
China
shenwei3563.4k wrote:

UCSC in-silico PCR with both standlane and online version

searches a sequence database with a pair of PCR primers, using an indexing strategy for fast performance.

http://genome.ucsc.edu/cgi-bin/hgPcr?command=start

ADD COMMENTlink written 13 months ago by shenwei3563.4k

It appears that the software is not readily available for download (you have to contact Jim Kent for it) It is free for academics (provided @sacha qualifies).

ADD REPLYlink written 13 months ago by genomax37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour