I am doing some RNA Seq (paired-end, 75bp, unstranded, good depth) alignments, focusing on olfactory genes. I see that they are often close together and with similar sequences (lots of pseudo genes), so I often get erroneous reads with long introns. I am using STAR, setting the option --alignIntronMax 25000 (default is much larger).
I am doing de novo assembly afterwards, to map some unclear UTRs. Badly aligned reads can make two close genes appear merged as a single gene.
I decided to plot the closest distance between any two olfactory genes using bedtools; I am also including an IGV screenshot showing my problem : http://imgur.com/1pvmcGd,sz91xhm#0
I see there is few olfactory genes closer than 7000bp, so I am using it as a new limit to the size of the intron. I can always use my previous alignment with --alignIntronMax 25000.
Do you have similar problems, and how do you resolve them? I would like to ditch the most dubious paired end reads.