Question

Extract sequence from fasta using primer

0

Entering edit mode

7.8 years ago

sacha ★ 2.4k

I Have a fasta database containing 16 RNA. I Would like to extract sequences between 2 primers. Like a in sillico PCR but for all 16S RNA in my database. Which software, i m sure there is one ?

pcr fasta primer extract • 9.1k views

ADD COMMENT • link updated 5.7 years ago by m.schmitt • 0 • written 7.8 years ago by sacha ★ 2.4k

0

Entering edit mode

I would love to have something like that :

  seq_extract -forward ACTGAGA -reverse TCGAGAGA  database.fasta > extract.fasta

in C++ of course :D

ADD REPLY • link 7.8 years ago by sacha ★ 2.4k

0

Entering edit mode

Why not use bbduk.sh from BBMap? You can provide the sequence as literal=left_primer,right_primer. Not C++ but probably of of the best java program there is.

ADD REPLY • link 7.8 years ago by GenoMax 144k

0

Entering edit mode

grep -B 1 ACTGAGA.*TCGAGAGA database.fasta > extract.fasta? (assuming you have linearized the fasta "database")

ADD REPLY • link 7.8 years ago by John 13k

2

Entering edit mode

7.8 years ago

Brian Bushnell 20k

To quote myself:

I also wrote another pair of programs specifically for working with primer pairs, msa.sh and cutprimers.sh. msa.sh will forcibly align a primer sequence (or a set of primer sequences) against a set of reference sequences to find the single best matching location per reference sequence - in other words, if you have 3 primers and 100 ref sequences, it will output a sam file with exactly 100 alignments - one per ref sequence, using the primer sequence that matched best. Of course you can also just run it with 1 primer sequence.

So you run msa twice - once for the left primer, and once for the right primer - and generate 2 sam files. Then you feed those into cutprimers.sh, which will create a new fasta file containing the sequence between the primers, for each reference sequence. We used these programs to synthetically cut V4 out of full-length 16S sequences (PacBio amplicons).

These are both in the BBMap package and tolerant of indels, as required due to PacBio's error profile. If you want to only use exact matches, you might need a different approach.

ADD COMMENT • link 7.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Is this more efficient than using bbduk?

ADD REPLY • link 7.8 years ago by GenoMax 144k

0

Entering edit mode

Computationally, it's much less efficient than BBDuk, for a small edit distance. But it works regardless of the orientation of the sequence. With BBDuk you'd essentially need to do a left-trim with one adapter, then a right-trim with the other adapter... which would only work for the sequences oriented as you expect. I'm not sure if 16S repositories all have the same orientation.

ADD REPLY • link 7.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Dear Brian I received some paired end reads containing 9 nucleotides in 5’ of Read1 as stater and 6 nucleotides in 3’ side of read2 as stopper random primers. So I need to remove 9 nucleotides from 5’ of Read1 and 6 nucleotides from 3’ side of read2. How can I remove these random primers by using BBMap? Is there any command for this kind of primer trimming in BBMap packages?

ADD REPLY • link 6.7 years ago by rahmati.razieh83 ▴ 30

0

Entering edit mode

7.8 years ago

shenwei356 8.6k

UCSC in-silico PCR with both standlane and online version

searches a sequence database with a pair of PCR primers, using an indexing strategy for fast performance.

http://genome.ucsc.edu/cgi-bin/hgPcr?command=start

ADD COMMENT • link 7.8 years ago by shenwei356 8.6k

0

Entering edit mode

It appears that the software is not readily available for download (you have to contact Jim Kent for it) It is free for academics (provided @sacha qualifies).

ADD REPLY • link 7.8 years ago by GenoMax 144k

0

Entering edit mode

5.7 years ago

m.schmitt • 0

I found this solution very helpfull to extract multiple amplicon sequences from 1 input fasta file:

The format of files is explained very nicely here: https://github.com/egonozer/in_silico_pcr

perl extractAmpliconSeq.pl -s $INPUT.fasta -p $primers.txt >log 2> $output.fasta

ADD COMMENT • link 5.7 years ago by m.schmitt • 0

score 2 · Accepted Answer · 2016-10-24

2

Entering edit mode

7.8 years ago

sacha ★ 2.4k

The following method works pretty well :

cutadapt --discard-untrimmed -g $FORWARD $INPUT 2> /dev/null | cutadapt --discard-untrimmed -a $REVERSE - 2> /dev/null > $OUTPUT

FORWARD and REVERSE are sequence primers .

ADD COMMENT • link 7.8 years ago by sacha ★ 2.4k