Question: Aligning short sequences to fastq
1
gravatar for BPors
3.1 years ago by
BPors40
BPors40 wrote:

Hi,

I am trying to search for the presence of couple sequences (around 400) each with a size of 23 bps,in different fastq files, while allowing 1-2 mismatches at maximum. I am not sure if turning the fastq to a genome(transcriptome) would be a nice approach? I have tried making the fastq -> fasta -> building blast database -> running blastn, however it did not run as my query is not only one sequence.

Example part of my query.file :

ATTTTTCTGAAAAACCCCCTACGA

AACAGGAAGTCAAAAAAAGCCAA

AGGATTTTTTTTTTTCTGGGGACA

The output I am aiming to have is, for each read in my query.file, which of these sequences are having 100% (or having 1-2 mismatches) match in fastq file, and possibly where in the fastq file.

I would appreciate your suggestions! Thank you!

rna-seq sequences aligning short • 1.5k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by BPors40
1

You could use bowtie instead of blast. Make a fasta from the fastq, build a bowtie index from it, then align the query. Bowtie has an option that controls how many mismatches are allowed in the seed (-n). As the seed (28bp) is longer than your queries, setting the max seed mismatches to 1 or 2 should be sufficient for your goal.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by ATpoint40k

Thank you for your answer. I would like to try, but I have these reads in just text format, therefore I cannot turn it to fastq. I think in Bowtie I have use reads in fastq format

ADD REPLYlink written 3.1 years ago by BPors40
1

No, several formats are accepted:

-q query input files are FASTQ .fq/.fastq (default) |||| -f query input files are (multi-)FASTA .fa/.mfa |||| -r query input files are raw one-sequence-per-line

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by ATpoint40k

Thank you! I have eventually used BBDUK but I will give bowtie a try soon with these options. ( -r).

ADD REPLYlink written 3.1 years ago by BPors40

I was not aware of that these is a function in BB. This BB stuff is really a jack-of-all-trades.

ADD REPLYlink written 3.1 years ago by ATpoint40k

Hi,

May be you can try to ta align with bwa aln your 23 bps seq against your fastq files as ref after you transformed it as fasta ?

Best

ADD REPLYlink written 3.1 years ago by Titus910

Thank you for your suggestion. Would this work if my reads are in text format?

ADD REPLYlink written 3.1 years ago by BPors40
3
gravatar for Brian Bushnell
3.1 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

You can grab the fastq sequences containing these 23-mers with BBDuk like this:

bbduk.sh in=file.fastq outm=matched.fastq ref=23mers.fa k=23 hdist=2

"hdist=2" allows 2 mismatches; you can alteratively set that to 1 or 0. This does not tell you where the match is, but you can do that like this:

bbduk.sh in=matched.fastq out=masked.fastq ref=23mers.fa k=23 hdist=2 kmask=lc

That will convert the matched regions to lowercase.

ADD COMMENTlink written 3.1 years ago by Brian Bushnell17k

Thank you! That worked well for me!

ADD REPLYlink written 3.1 years ago by BPors40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1457 users visited in the last hour