Align short sequence against ONT reads
1
0
Entering edit mode
7 weeks ago

Hi there! I have many .fq files contained long reads (I got them with ONT MiniON). Also I have a .fasta file contains a specific short sequence (exon from different organism). I want to get if this short sequence aligns against these long reads (is there orthology between organisms?). I do not know what aligner to use, I used to work with short reads and usually I aligned them against a large reference genome (with bwa mem e.g.). In the beginning I do not want to assemble .fastq files. Thanks in advance!

reads ONT long genome alignment • 382 views
ADD COMMENT
0
Entering edit mode

Are the long reads very different from each other (e.g. covering different regions of the genome) or do they all represent the same (more or less) region? Why don't you align the long reads against the genome of the organism from which you've derived the (one?) exon sequence?

ADD REPLY
0
Entering edit mode
7 weeks ago

Very weird question, but ....

  • just use the gene/exon as a reference sequence
  • use minimap2 to align the ONT reads against this

Not sure if minimap2 will work well if the reads are so much longer than a (tiny) exon reference sequence, but you can try it.

ADD COMMENT
0
Entering edit mode

It won't work because minimap2 aligns against a large reference sequence. What I want to do is to align each read in .fastq files against short reference sequence (exon)

ADD REPLY
1
Entering edit mode

ONT reads can have a lot of errors and if the exon is to short the following solution might not work for the noisy reads.

Use shred.sh from BBMap to generate 300 bp fragments from the ONT reads:

shred.sh in=ONT.fq out=ONT_frag.fq length=300 # ONT_frag.fq should retain the read header similar to the original but with a small modification

Example:

@NB501138:291:H7FCVBGXH:1:11101:15246:1057 1:N:0:1 # original header before shred

@NB501138:291:H7FCVBGXH:1:11101:15246:1057 1:N:0:1_0-19 # header after shred 

@NB501138:291:H7FCVBGXH:1:11101:15246:1057 1:N:0:1_20-39 # header after shred 

@NB501138:291:H7FCVBGXH:1:11101:15246:1057 1:N:0:1_40-59 # header after shred 

Use bbduk.sh to identify the 300 bp fragment with the exon

bbduk.sh in=ONT_frag.fq outm=ONT_exon_frag.fq k=31 ref=exon.fasta # the headers in ONT_exone_frag.fq should tell you which ONT reads have the exon
ADD REPLY
0
Entering edit mode

Thank you I'll try this approach!

ADD REPLY

Login before adding your answer.

Traffic: 1283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6