align spike in database
2
0
Entering edit mode
4.4 years ago
Björn ▴ 110

Hi, I have fasta sequences from sample that includes spike-in controls. How can I align those spike-in with the database using bowtie2 and filter them to create mapped reads of FASTQ files. Any references or scripts to perform the Task would be appreciated. Thanks

RNA-Seq Spike-in Bowtie2 fastqc • 1.7k views
0
Entering edit mode
4.4 years ago

Depending on how long and specific your spike-ins are, I recommend using BBMap's Seal which can both remove and quantify them using kmer-matching. For example:

seal.sh in=reads.fq ref=spikeins.fa pattern=spikein_%.fq outu=clean.fq stats=stats.txt k=31

0
Entering edit mode
4.4 years ago

Make a new reference file which includes the spike-in sequences. Reindex that genome with Bowtie, realign.

0
Entering edit mode

Hi swbarnes2, I prepared a new *.fa file with all the sequences of spike-in as shown below

UniSP100
TCCCAAATGTAGACAAAGCA
UniSP101
TGAAGCTGCCAGCATGATCTA
UniSP102
CAGCCAAGGATGACTTGCCGG

Would you be kind enough to send me the script for Bowtie2 provided my test sequence is B12_015.fastq? Thank you very much

0
Entering edit mode

With such short sequences, I think kmer-matching will probably work better than alignment... are those the full length of the spike-ins?

0
Entering edit mode

Yes, they are the full length of the spike-ins. Although, there were 12 spike-ins used, I gave examples of only 3 spike-ins which is ok to know the command-line for removal .

0
Entering edit mode

In that case, if you decide to use Seal, change the flag "K=31" to "k=20" or whatever the length is of the shortest spike-in. And you may want to allow a substitution with the "hdist" flag, e.g.

seal.sh in=reads.fq ref=spikeins.fa pattern=spikein_%.fq outu=clean.fq stats=stats.txt k=20 hdist=1