I am working with long reads generated by Nanopore technology.
I tried to write a small script in python (using pysam) that would allow, for every read, to 'detect' to which gene they actually map to. To do that, I was using the following informations: the chromosome, the strand, and the start and end genomic position of my read.
However, I just came to realize there is a flaw in my design. For a given gene, I will get reads that map to both strands (because of the way the library is prepared and the way the molecule is sequenced). So I cannot use the 'strand' information to detect which gene my read is actually mapping to. And I'm afraid that if I only use the chromosome information and the genomic coordinates, I will end up with 'false positive' in places where I have nested genes...
Would anyone have an idea how I could fix that ? Thanks in advance !