Should I merge the Paired-end RNA-seq reads for BLAT for intron identification
1
0
Entering edit mode
2.5 years ago
praasu ▴ 40

Hi,

I have RNA-seq data (PE 2x300) for non-model organisms. I want of aligning them against the assembled genome using BLAT for the identification of introns. Since BLAT can not align paired-end reads. I want to convert them into single-end reads by merging the overlapping R1 and R2.

I would like to know if it makes sense to merge R1 and R2 into one read (if overlaps) or maybe I should work with R1 and R2 separately.

My Best Regards, Prasoon

RNA-Seq BLAT alignment • 931 views
0
Entering edit mode
2.5 years ago

You should use an aligner designed to align sequencing paired-end reads. BLAT was not designed to handle such data.

Possible options depending on your organism :

1. If assembled genome is good and an annotation exists :

• option A :
• Align with STAR (use 2-pass option)
• DE analysis with DESeq2
• option B :
• pseudo-alignment with kallisto using transcriptome (cDNA) sequences.
• import kallisto results within R (tximport)
• DE analysis with DESeq2
2. If assembled genome isn't good but transcriptome is availble (cDNA sequences)

• option B :
• pseudo-alignment with kallisto using transcriptome (cDNA) sequences.
• import kallisto results within R (tximport)
• DE analysis with DESeq2
0
Entering edit mode

Hi Nicolas,

Thanks for your reply. I am working on the genome in which possibly non-canonical (non-GT-AG) splice sites are predominant. That's why I don't want to align with STAR. I don't know about the kallisto aligner if it is suitable for my case.

1
Entering edit mode

STAR handles well non-canonical splice sites. If you want to reduce penalty of non-canonical splice sites by changing --scoreGapNoncan to -4 or even 0 (default is -8). You may also change --scoreGapGCAG and --scoreGapATAC to 0. Be aware that you may increase false positive splice-reads. I suggest you to read STAR manual : https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf . All parameters are well explained :)

1
Entering edit mode

RNA-seq reads do not contain the non-canonical splice site sequences that are located in introns. That issue is therefore moot if you are using option B with a transcriptome from your genome.

0
Entering edit mode

RNA-seq often consists of intronic region or unspliced reads. I was following the discussing here in Why Are There Many Rna-Seq Hits To Intronic Regions?