Question: How to extract all ONT reads that connect/overlaps two or more contigs ?
1
gravatar for BioGeek
2.3 years ago by
BioGeek150
BioGeek150 wrote:

I want to focus on all those reads that connect two or more contigs. How to extract ONLY those nanopore reads ?

ADD COMMENTlink modified 6 months ago by lagartija80 • written 2.3 years ago by BioGeek150

Did you map your reads? How?

ADD REPLYlink written 2.3 years ago by h.mon30k

mapping is done by minimap2

ADD REPLYlink written 2.3 years ago by BioGeek150
0
gravatar for lagartija
6 months ago by
lagartija80
lagartija80 wrote:

I wanted to do the same and this turned out to be actually much harder than expected (because the SAM format is quite cryptic I think). I chose the simpler option which is : mapping with minimap 2 (no need to use the option splice) extract reads ids :

 
 #!/usr/bin/env python3

import sys
samfile =  open(sys.argv[1], 'r')
read_dic = {}

for line in samfile: 
 line=line.rstrip('\n')
 tab_line = line.split("\t")
 read=tab_line[0]
 contig=tab_line[2]

 if read not in read_dic:
     read_dic[read]=list()
     read_dic[read].append(contig)
 else :
     read_dic[read].append(contig)

for i in read_dic :

 if len(list(set(read_dic[i]))) == 2:
     print(i)

extract with seqtk subseq reads list and map again then visualize

I'm sure there is a more elegant option but in the mid time this was useful for me

ADD COMMENTlink written 6 months ago by lagartija80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 997 users visited in the last hour