Align short sequence against ONT reads
1
0
Entering edit mode
7 weeks ago

Hi there! I have many .fq files contained long reads (I got them with ONT MiniON). Also I have a .fasta file contains a specific short sequence (exon from different organism). I want to get if this short sequence aligns against these long reads (is there orthology between organisms?). I do not know what aligner to use, I used to work with short reads and usually I aligned them against a large reference genome (with bwa mem e.g.). In the beginning I do not want to assemble .fastq files. Thanks in advance!

reads ONT long genome alignment • 377 views
0
Entering edit mode

Are the long reads very different from each other (e.g. covering different regions of the genome) or do they all represent the same (more or less) region? Why don't you align the long reads against the genome of the organism from which you've derived the (one?) exon sequence?

0
Entering edit mode
6 weeks ago
colindaven ★ 4.0k

Very weird question, but ....

• just use the gene/exon as a reference sequence
• use minimap2 to align the ONT reads against this

Not sure if minimap2 will work well if the reads are so much longer than a (tiny) exon reference sequence, but you can try it.

0
Entering edit mode

It won't work because minimap2 aligns against a large reference sequence. What I want to do is to align each read in .fastq files against short reference sequence (exon)

1
Entering edit mode

ONT reads can have a lot of errors and if the exon is to short the following solution might not work for the noisy reads.

Use shred.sh from BBMap to generate 300 bp fragments from the ONT reads:

shred.sh in=ONT.fq out=ONT_frag.fq length=300 # ONT_frag.fq should retain the read header similar to the original but with a small modification


Example:

@NB501138:291:H7FCVBGXH:1:11101:15246:1057 1:N:0:1 # original header before shred

@NB501138:291:H7FCVBGXH:1:11101:15246:1057 1:N:0:1_0-19 # header after shred

@NB501138:291:H7FCVBGXH:1:11101:15246:1057 1:N:0:1_20-39 # header after shred

@NB501138:291:H7FCVBGXH:1:11101:15246:1057 1:N:0:1_40-59 # header after shred


Use bbduk.sh to identify the 300 bp fragment with the exon

bbduk.sh in=ONT_frag.fq outm=ONT_exon_frag.fq k=31 ref=exon.fasta # the headers in ONT_exone_frag.fq should tell you which ONT reads have the exon

0
Entering edit mode

Thank you I'll try this approach!