Should I merge the Paired-end RNA-seq reads for BLAT for intron identification
1
0
Entering edit mode
4.3 years ago
praasu ▴ 40

Hi,

I have RNA-seq data (PE 2x300) for non-model organisms. I want of aligning them against the assembled genome using BLAT for the identification of introns. Since BLAT can not align paired-end reads. I want to convert them into single-end reads by merging the overlapping R1 and R2.

I would like to know if it makes sense to merge R1 and R2 into one read (if overlaps) or maybe I should work with R1 and R2 separately.

My Best Regards, Prasoon

RNA-Seq BLAT alignment • 1.6k views
ADD COMMENT
0
Entering edit mode
4.3 years ago

You should use an aligner designed to align sequencing paired-end reads. BLAT was not designed to handle such data.

Possible options depending on your organism :

  1. If assembled genome is good and an annotation exists :

    • option A :
      • Align with STAR (use 2-pass option)
      • Count reads with featureCounts
      • DE analysis with DESeq2
    • option B :
      • pseudo-alignment with kallisto using transcriptome (cDNA) sequences.
      • import kallisto results within R (tximport)
      • DE analysis with DESeq2
  2. If assembled genome isn't good but transcriptome is availble (cDNA sequences)

    • option B :
      • pseudo-alignment with kallisto using transcriptome (cDNA) sequences.
      • import kallisto results within R (tximport)
      • DE analysis with DESeq2
ADD COMMENT
0
Entering edit mode

Hi Nicolas,

Thanks for your reply. I am working on the genome in which possibly non-canonical (non-GT-AG) splice sites are predominant. That's why I don't want to align with STAR. I don't know about the kallisto aligner if it is suitable for my case.

ADD REPLY
1
Entering edit mode

STAR handles well non-canonical splice sites. If you want to reduce penalty of non-canonical splice sites by changing --scoreGapNoncan to -4 or even 0 (default is -8). You may also change --scoreGapGCAG and --scoreGapATAC to 0. Be aware that you may increase false positive splice-reads. I suggest you to read STAR manual : https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf . All parameters are well explained :)

ADD REPLY
1
Entering edit mode

RNA-seq reads do not contain the non-canonical splice site sequences that are located in introns. That issue is therefore moot if you are using option B with a transcriptome from your genome.

ADD REPLY
0
Entering edit mode

RNA-seq often consists of intronic region or unspliced reads. I was following the discussing here in Why Are There Many Rna-Seq Hits To Intronic Regions?

ADD REPLY

Login before adding your answer.

Traffic: 1622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6