Do you have abnormally long introns in your RNA Seq alignment? How do you get rid of them?
2
1
Entering edit mode
6.3 years ago
cyril-cros ▴ 910

I am doing some RNA Seq (paired-end, 75bp, unstranded, good depth) alignments, focusing on olfactory genes. I see that they are often close together and with similar sequences (lots of pseudo genes), so I often get erroneous reads with long introns. I am using STAR, setting the option --alignIntronMax 25000 (default is much larger).  

I am doing de novo assembly afterwards, to map some unclear UTRs. Badly aligned reads can make two close genes appear merged as a single gene.

I decided to plot the closest distance between any two olfactory genes using bedtools; I am also including an IGV screenshot showing my problem : http://imgur.com/1pvmcGd,sz91xhm#0

I see there is few olfactory genes closer than 7000bp, so I am using it as a new limit to the size of the intron. I can always use my previous alignment with  --alignIntronMax 25000.

Do you have similar problems, and how do you resolve them? I would like to ditch the most dubious paired end reads.

RNA-Seq alignment STAR • 2.5k views
ADD COMMENT
0
Entering edit mode

I'm not sure that it's the intron length that's the problem, since there actually are long introns. The problem you're running into is due to olfactory receptors being very similar and clustered, so any disagreement with the reference sequence results in aberrant fusion genes. This might be a case where the tophat2 (or perhaps hisat, I've used it but can't say I'm familiar enough with it yet) method might actually be preferable.

ADD REPLY
0
Entering edit mode

Looking at the tophat2 documentation, I get this option.

--read-realign-edit-dist:

Some of the reads spanning multiple exons may be mapped incorrectly as a contiguous alignment to the genome even though the correct alignment should be a spliced one - this can happen in the presence of processed pseudogenes that are rarely (if at all) transcribed or expressed. 

STAR is however much faster. It also has an option --alignMatesGapMax which might be of use to me....

ADD REPLY
1
Entering edit mode
6.3 years ago
cyril-cros ▴ 910

Running a lower  --alignIntronMax helped, but I am losing some 'real' introns from what I can see. There is a trade-off here, which is hard to solve. Will try again...

ADD COMMENT
0
Entering edit mode

Yeah, you have a rather tricky case. One possible method for you might be to simply align against the transcriptome, since you could then avoid some of these issues. That's often a method of last resort, but depending on what your biological question is it might be helpful.

ADD REPLY
0
Entering edit mode
6.2 years ago
h.mon 33k

You could start with a small --alignIntronMax, them remove the reads where both pairs mapped from your fastq. With the reduced dataset, map again increasing --alignIntronMax. Wash, rinse, repeat until satisfied.

ADD COMMENT

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6