Question

Aligning short sequences to fastq

2

Entering edit mode

6.6 years ago

BPors ▴ 60

Hi,

I am trying to search for the presence of couple sequences (around 400) each with a size of 23 bps,in different fastq files, while allowing 1-2 mismatches at maximum. I am not sure if turning the fastq to a genome(transcriptome) would be a nice approach? I have tried making the fastq -> fasta -> building blast database -> running blastn, however it did not run as my query is not only one sequence.

Example part of my query.file :

ATTTTTCTGAAAAACCCCCTACGA

AACAGGAAGTCAAAAAAAGCCAA

AGGATTTTTTTTTTTCTGGGGACA

The output I am aiming to have is, for each read in my query.file, which of these sequences are having 100% (or having 1-2 mismatches) match in fastq file, and possibly where in the fastq file.

I would appreciate your suggestions! Thank you!

RNA-Seq short sequences aligning • 3.4k views

ADD COMMENT • link 6.6 years ago by BPors ▴ 60

1

Entering edit mode

You could use bowtie instead of blast. Make a fasta from the fastq, build a bowtie index from it, then align the query. Bowtie has an option that controls how many mismatches are allowed in the seed (-n). As the seed (28bp) is longer than your queries, setting the max seed mismatches to 1 or 2 should be sufficient for your goal.

ADD REPLY • link 6.6 years ago by ATpoint 82k

0

Entering edit mode

Thank you for your answer. I would like to try, but I have these reads in just text format, therefore I cannot turn it to fastq. I think in Bowtie I have use reads in fastq format

ADD REPLY • link 6.6 years ago by BPors ▴ 60

1

Entering edit mode

No, several formats are accepted:

-q query input files are FASTQ .fq/.fastq (default) |||| -f query input files are (multi-)FASTA .fa/.mfa |||| -r query input files are raw one-sequence-per-line

ADD REPLY • link 6.6 years ago by ATpoint 82k

0

Entering edit mode

Thank you! I have eventually used BBDUK but I will give bowtie a try soon with these options. ( -r).

ADD REPLY • link 6.6 years ago by BPors ▴ 60

0

Entering edit mode

I was not aware of that these is a function in BB. This BB stuff is really a jack-of-all-trades.

ADD REPLY • link 6.6 years ago by ATpoint 82k

0

Entering edit mode

Hi,

May be you can try to ta align with bwa aln your 23 bps seq against your fastq files as ref after you transformed it as fasta ?

Best

ADD REPLY • link 6.6 years ago by Titus ▴ 910

0

Entering edit mode

Thank you for your suggestion. Would this work if my reads are in text format?

ADD REPLY • link 6.6 years ago by BPors ▴ 60

score 4 · Accepted Answer · 2017-09-04

4

Entering edit mode

6.6 years ago

Brian Bushnell 20k

You can grab the fastq sequences containing these 23-mers with BBDuk like this:

bbduk.sh in=file.fastq outm=matched.fastq ref=23mers.fa k=23 hdist=2

"hdist=2" allows 2 mismatches; you can alteratively set that to 1 or 0. This does not tell you where the match is, but you can do that like this:

bbduk.sh in=matched.fastq out=masked.fastq ref=23mers.fa k=23 hdist=2 kmask=lc

That will convert the matched regions to lowercase.

ADD COMMENT • link 6.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you! That worked well for me!

ADD REPLY • link 6.6 years ago by BPors ▴ 60