Question: Should I merge the Paired-end RNA-seq reads for BLAT for intron identification
0
gravatar for praasu
9 months ago by
praasu30
Prague, Czech republic
praasu30 wrote:

Hi,

I have RNA-seq data (PE 2x300) for non-model organisms. I want of aligning them against the assembled genome using BLAT for the identification of introns. Since BLAT can not align paired-end reads. I want to convert them into single-end reads by merging the overlapping R1 and R2.

I would like to know if it makes sense to merge R1 and R2 into one read (if overlaps) or maybe I should work with R1 and R2 separately.

My Best Regards, Prasoon

blat rna-seq alignment • 325 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by praasu30
0
gravatar for Nicolas Rosewick
9 months ago by
Belgium, Brussels
Nicolas Rosewick9.2k wrote:

You should use an aligner designed to align sequencing paired-end reads. BLAT was not designed to handle such data.

Possible options depending on your organism :

  1. If assembled genome is good and an annotation exists :

    • option A :
      • Align with STAR (use 2-pass option)
      • Count reads with featureCounts
      • DE analysis with DESeq2
    • option B :
      • pseudo-alignment with kallisto using transcriptome (cDNA) sequences.
      • import kallisto results within R (tximport)
      • DE analysis with DESeq2
  2. If assembled genome isn't good but transcriptome is availble (cDNA sequences)

    • option B :
      • pseudo-alignment with kallisto using transcriptome (cDNA) sequences.
      • import kallisto results within R (tximport)
      • DE analysis with DESeq2
ADD COMMENTlink written 9 months ago by Nicolas Rosewick9.2k

Hi Nicolas,

Thanks for your reply. I am working on the genome in which possibly non-canonical (non-GT-AG) splice sites are predominant. That's why I don't want to align with STAR. I don't know about the kallisto aligner if it is suitable for my case.

ADD REPLYlink modified 9 months ago • written 9 months ago by praasu30
1

STAR handles well non-canonical splice sites. If you want to reduce penalty of non-canonical splice sites by changing --scoreGapNoncan to -4 or even 0 (default is -8). You may also change --scoreGapGCAG and --scoreGapATAC to 0. Be aware that you may increase false positive splice-reads. I suggest you to read STAR manual : https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf . All parameters are well explained :)

ADD REPLYlink modified 9 months ago • written 9 months ago by Nicolas Rosewick9.2k
1

RNA-seq reads do not contain the non-canonical splice site sequences that are located in introns. That issue is therefore moot if you are using option B with a transcriptome from your genome.

ADD REPLYlink written 9 months ago by Lior Pachter540

RNA-seq often consists of intronic region or unspliced reads. I was following the discussing here in Why Are There Many Rna-Seq Hits To Intronic Regions?

ADD REPLYlink modified 9 months ago • written 9 months ago by praasu30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1529 users visited in the last hour