Question

Should I merge the Paired-end RNA-seq reads for BLAT for intron identification

0

Entering edit mode

4.3 years ago

praasu ▴ 40

Hi,

I have RNA-seq data (PE 2x300) for non-model organisms. I want of aligning them against the assembled genome using BLAT for the identification of introns. Since BLAT can not align paired-end reads. I want to convert them into single-end reads by merging the overlapping R1 and R2.

I would like to know if it makes sense to merge R1 and R2 into one read (if overlaps) or maybe I should work with R1 and R2 separately.

My Best Regards, Prasoon

RNA-Seq BLAT alignment • 1.6k views

ADD COMMENT • link 4.3 years ago by praasu ▴ 40

score 0 · Answer 1 · 2020-01-15

0

Entering edit mode

4.3 years ago

Nicolas Rosewick 10k

You should use an aligner designed to align sequencing paired-end reads. BLAT was not designed to handle such data.

Possible options depending on your organism :

If assembled genome is good and an annotation exists :
- option A :
  - Align with STAR (use 2-pass option)
  - Count reads with featureCounts
  - DE analysis with DESeq2
- option B :
  - pseudo-alignment with kallisto using transcriptome (cDNA) sequences.
  - import kallisto results within R (tximport)
  - DE analysis with DESeq2
If assembled genome isn't good but transcriptome is availble (cDNA sequences)
- option B :
  - pseudo-alignment with kallisto using transcriptome (cDNA) sequences.
  - import kallisto results within R (tximport)
  - DE analysis with DESeq2

ADD COMMENT • link 4.3 years ago by Nicolas Rosewick 10k

0

Entering edit mode

Hi Nicolas,

Thanks for your reply. I am working on the genome in which possibly non-canonical (non-GT-AG) splice sites are predominant. That's why I don't want to align with STAR. I don't know about the kallisto aligner if it is suitable for my case.

ADD REPLY • link 4.3 years ago by praasu ▴ 40

1

Entering edit mode

STAR handles well non-canonical splice sites. If you want to reduce penalty of non-canonical splice sites by changing --scoreGapNoncan to -4 or even 0 (default is -8). You may also change --scoreGapGCAG and --scoreGapATAC to 0. Be aware that you may increase false positive splice-reads. I suggest you to read STAR manual : https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf . All parameters are well explained :)

ADD REPLY • link 4.3 years ago by Nicolas Rosewick 10k

1

Entering edit mode

RNA-seq reads do not contain the non-canonical splice site sequences that are located in introns. That issue is therefore moot if you are using option B with a transcriptome from your genome.

ADD REPLY • link 4.3 years ago by Lior Pachter ▴ 700

0

Entering edit mode

RNA-seq often consists of intronic region or unspliced reads. I was following the discussing here in Why Are There Many Rna-Seq Hits To Intronic Regions?

ADD REPLY • link 4.3 years ago by praasu ▴ 40