Question

align spike in database

0

Entering edit mode

6.8 years ago

Björn ▴ 110

Hi, I have fasta sequences from sample that includes spike-in controls. How can I align those spike-in with the database using bowtie2 and filter them to create mapped reads of FASTQ files. Any references or scripts to perform the Task would be appreciated. Thanks

RNA-Seq Spike-in Bowtie2 fastqc • 2.7k views

ADD COMMENT • link updated 6.8 years ago by swbarnes2 14k • written 6.8 years ago by Björn ▴ 110

score 0 · Answer 1 · 2017-07-20

0

Entering edit mode

6.8 years ago

Brian Bushnell 20k

Depending on how long and specific your spike-ins are, I recommend using BBMap's Seal which can both remove and quantify them using kmer-matching. For example:

seal.sh in=reads.fq ref=spikeins.fa pattern=spikein_%.fq outu=clean.fq stats=stats.txt k=31

ADD COMMENT • link 6.8 years ago by Brian Bushnell 20k

score 0 · Answer 2 · 2017-07-20

0

Entering edit mode

6.8 years ago

swbarnes2 14k

Make a new reference file which includes the spike-in sequences. Reindex that genome with Bowtie, realign.

ADD COMMENT • link 6.8 years ago by swbarnes2 14k

0

Entering edit mode

Hi swbarnes2, I prepared a new *.fa file with all the sequences of spike-in as shown below

UniSP100
TCCCAAATGTAGACAAAGCA
UniSP101
TGAAGCTGCCAGCATGATCTA
UniSP102
CAGCCAAGGATGACTTGCCGG

Would you be kind enough to send me the script for Bowtie2 provided my test sequence is B12_015.fastq? Thank you very much

ADD REPLY • link 6.8 years ago by Björn ▴ 110

0

Entering edit mode

With such short sequences, I think kmer-matching will probably work better than alignment... are those the full length of the spike-ins?

ADD REPLY • link 6.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Yes, they are the full length of the spike-ins. Although, there were 12 spike-ins used, I gave examples of only 3 spike-ins which is ok to know the command-line for removal .

ADD REPLY • link 6.7 years ago by Björn ▴ 110

0

Entering edit mode

In that case, if you decide to use Seal, change the flag "K=31" to "k=20" or whatever the length is of the shortest spike-in. And you may want to allow a substitution with the "hdist" flag, e.g.

seal.sh in=reads.fq ref=spikeins.fa pattern=spikein_%.fq outu=clean.fq stats=stats.txt k=20 hdist=1

ADD REPLY • link 6.7 years ago by Brian Bushnell 20k