Extract sequence from fasta using primer
4
0
Entering edit mode
6.1 years ago
sacha ★ 2.4k

I Have a fasta database containing 16 RNA. I Would like to extract sequences between 2 primers. Like a in sillico PCR but for all 16S RNA in my database. Which software, i m sure there is one ?

pcr fasta primer extract • 7.1k views
0
Entering edit mode

I would love to have something like that :

  seq_extract -forward ACTGAGA -reverse TCGAGAGA  database.fasta > extract.fasta


in C++ of course :D

0
Entering edit mode

Why not use bbduk.sh from BBMap? You can provide the sequence as literal=left_primer,right_primer. Not C++ but probably of of the best java program there is.

0
Entering edit mode

grep -B 1 ACTGAGA.*TCGAGAGA database.fasta > extract.fasta? (assuming you have linearized the fasta "database")

2
Entering edit mode
6.1 years ago
sacha ★ 2.4k

The following method works pretty well :

cutadapt --discard-untrimmed -g $FORWARD$INPUT 2> /dev/null | cutadapt --discard-untrimmed -a $REVERSE - 2> /dev/null >$OUTPUT


FORWARD and REVERSE are sequence primers .

2
Entering edit mode
6.1 years ago

To quote myself:

I also wrote another pair of programs specifically for working with primer pairs, msa.sh and cutprimers.sh. msa.sh will forcibly align a primer sequence (or a set of primer sequences) against a set of reference sequences to find the single best matching location per reference sequence - in other words, if you have 3 primers and 100 ref sequences, it will output a sam file with exactly 100 alignments - one per ref sequence, using the primer sequence that matched best. Of course you can also just run it with 1 primer sequence.

So you run msa twice - once for the left primer, and once for the right primer - and generate 2 sam files. Then you feed those into cutprimers.sh, which will create a new fasta file containing the sequence between the primers, for each reference sequence. We used these programs to synthetically cut V4 out of full-length 16S sequences (PacBio amplicons).

These are both in the BBMap package and tolerant of indels, as required due to PacBio's error profile. If you want to only use exact matches, you might need a different approach.

0
Entering edit mode

Is this more efficient than using bbduk?

0
Entering edit mode

Computationally, it's much less efficient than BBDuk, for a small edit distance. But it works regardless of the orientation of the sequence. With BBDuk you'd essentially need to do a left-trim with one adapter, then a right-trim with the other adapter... which would only work for the sequences oriented as you expect. I'm not sure if 16S repositories all have the same orientation.

0
Entering edit mode

Dear Brian I received some paired end reads containing 9 nucleotides in 5’ of Read1 as stater and 6 nucleotides in 3’ side of read2 as stopper random primers. So I need to remove 9 nucleotides from 5’ of Read1 and 6 nucleotides from 3’ side of read2. How can I remove these random primers by using BBMap? Is there any command for this kind of primer trimming in BBMap packages?

0
Entering edit mode
6.1 years ago

UCSC in-silico PCR with both standlane and online version

searches a sequence database with a pair of PCR primers, using an indexing strategy for fast performance.

http://genome.ucsc.edu/cgi-bin/hgPcr?command=start

0
Entering edit mode

It appears that the software is not readily available for download (you have to contact Jim Kent for it) It is free for academics (provided @sacha qualifies).

0
Entering edit mode
4.1 years ago
m.schmitt • 0

I found this solution very helpfull to extract multiple amplicon sequences from 1 input fasta file:

The format of files is explained very nicely here: https://github.com/egonozer/in_silico_pcr

perl extractAmpliconSeq.pl -s $INPUT.fasta -p$primers.txt >log 2> \$output.fasta