Question: hisat2-stringtie-ballgown pipeline gives high p-value and q-value for RNA-seq of Arabidopsis thaliana
0
gravatar for tiancaigg
8 months ago by
tiancaigg0
tiancaigg0 wrote:

I've been performing RNA-seq according to https://bioinformatics.uconn.edu/rnaseq-arabidopsis/ , following the hisat2-stringtie-ballgown pipeline, but it gave very high p-value and q-value, I've tried many different parameters, but it doesn't change. I am not sure if there are anything wrong in my pipeline.

I build the hisat_index by

pythotract_splice_sites.py /mnt/chaelab/donghui/TAIR10/gtf/Arabidopsis_thaliana.TAIR10.42.gtf >/mnt/chaelab/donghui/TAIR10/new_index/TAIR10.ss
python extract_exons.py /mnt/chaelab/donghui/TAIR10/gtf/Arabidopsis_thaliana.TAIR10.42.gtf >/mnt/chaelab/donghui/TAIR10/new_index/TAIR10.exon
hisat2-build --ss /mnt/chaelab/donghui/TAIR10/new_index/TAIR10.ss --exon /mnt/chaelab/donghui/TAIR10/new_index/TAIR10.exon /mnt/chaelab/donghui/TAIR10/TAIR10_chr_all.fas /mnt/chaelab/donghui/TAIR10/new_index/index

trim raw_data by AfterQC,

for i in {12,13,15,16}
do 
    python2 /mnt/chaelab/donghui/download/AfterQC-master/after.py -1 hisat/SRR34982${i}.fastq
done

then run the pipeline:

gtf=/mnt/chaelab/donghui/TAIR10/gtf/Arabidopsis_thaliana.TAIR10.42.gtf
index=/mnt/chaelab/donghui/TAIR10/new_index/index

mkdir 1_hisat 2_samtools_bam 3_stringtie_gtf 4_stringtie_merge 5_stringtie_for_ballgown


for i in {12,13,15,16}
do 
    hisat2 -p 8 --dta -x $index -q good/SRR34982${i}.good.fq -S 1_hisat/SRR34982${i}.sam 
done

for i in {12,13,15,16}
do
    samtools sort -@ 8 -o 2_samtools_bam/SRR34982${i}.bam 1_hisat/SRR34982${i}.sam 
done

for i in {12,13,15,16}
do
    stringtie -p 8 -G $gtf -o 3_stringtie_gtf/SRR34982${i}.gtf -l SRR34982${i} 2_samtools_bam/SRR34982${i}.bam 
done

ls 3_stringtie_gtf/*.gtf > 3_stringtie_gtf/assebly_GTF_list.txt
stringtie --merge -p 8 -o 4_stringtie_merge/merged_gtf.gtf -G $gtf 3_stringtie_gtf/assebly_GTF_list.txt

for i in {12,13,15,16}
do
    mkdir 5_stringtie_for_ballgown/SRR34982${i}
    stringtie -e -B -p 8 -G 4_stringtie_merge/merged_gtf.gtf -o 5_stringtie_for_ballgown/SRR34982${i}/SRR34982${i}.gtf 2_samtools_bam/SRR34982${i}.bam 
done

Then the R script:

library(ballgown)
library(RSkittleBrewer)
library(genefilter)
library(dplyr)
library(ggplot2)
library(gplots)
library(devtools)

dir <- "/mnt/chaelab/donghui/root_shoot/"
setwd(dir)

pheno_data <- data.frame( ID = c('SRR3498212','SRR3498213', 'SRR3498215', 'SRR3498216'),
                          Type= c( 'root','root','shoot', 'shoot'))

bg <- ballgown(dataDir = "5_stringtie_for_ballgown", samplePattern = 'SRR34982', pData = pheno_data )
print(bg)

ballgown instance with 54127 transcripts and 4 samples

#remove low-abundance genes ( 0 and 1 )
bg_filt <- subset(bg,"rowVars(texpr(bg)) >1", genomesubset=TRUE)
print(bg_filt)

ballgown instance with 18483 transcripts and 4 samples

results_transcripts <- stattest(bg_filt, feature = "transcript", covariate = "Type", getFC = TRUE, meas = "FPKM")

results_transcripts <- data.frame(geneNames = ballgown::geneNames(bg_filt), 
                                 geneIDs = ballgown::geneIDs(bg_filt), 
                                 results_transcripts)
summary(results_transcripts)
head(results_transcripts, n = 500)

subset(results_transcripts,results_transcripts$qval<0.05)

results_transcripts = arrange(results_transcripts,pval)
write.csv(results_transcripts, "afterqc_transcript_results.csv", row.names=FALSE)

however, the result shows very high q-val,

   geneNames          geneIDs            feature            id       
 .      :8699   MSTRG.6333:   11   transcript:18483   1000   :    1  
 A1     :   9   MSTRG.2749:   10                      10004  :    1  
 ARP1   :   8   MSTRG.5542:    7                      10008  :    1  
 ATGRP9 :   8   MSTRG.8965:    7                      1001   :    1  
 RPL13B :   7   MSTRG.1251:    6                      10011  :    1  
 UBQ10  :   7   MSTRG.1364:    6                      10014  :    1  
 (Other):9745   (Other)   :18436                      (Other):18477  
       fc                            pval              qval       
 Min.   :0.000e+00   Min.   :0.0000135   Min.   :0.2488  
 1st Qu.:0.000e+00   1st Qu.:0.1320150   1st Qu.:0.5278  
 Median :1.000e+00   Median :0.3798743   Median :0.7596  
 Mean   :8.941e+10   Mean   :0.4354506   Mean   :0.7457  
 3rd Qu.:8.000e+00   3rd Qu.:0.7277507   3rd Qu.:0.9703  
 Max.   :1.645e+15   Max.   :0.9999579   Max.   :1.0000 

geneNames   geneIDs feature id  fc            pval                    qval
  2 ARV1    AT1G01020   transcript  2   4.887496e-02    0.42724866  0.7960507
11  DCL1    MSTRG.10    transcript  11  1.996840e+00    0.86585464  0.9883641
12  .   MSTRG.11    transcript  12  2.095266e+00    0.91548484  0.9937310
14  MIR838A MSTRG.12    transcript  14  1.098434e+00    0.98874281  0.9937310
15  PPA1    MSTRG.13    transcript  15  2.263532e+04    0.10262150  0.4894779
21  LHY AT1G01060   transcript  21  3.983061e-04    0.18029668  0.5788472
23  LHY AT1G01060   transcript  23  1.375484e+06    0.08195434  0.4894779
26  .   MSTRG.2 transcript  26  1.592113e-01    0.83509407  0.9704967
27  .   MSTRG.2 transcript  27  6.134825e+01    0.65382588  0.9334463
29  .   MSTRG.5 transcript  29  7.045163e+04    0.08195434  0.4894779
30  .   MSTRG.5 transcript  30  4.287336e-03    0.36838524  0.7517703
32  PDH-E1 ALPHA    MSTRG.6 transcript  32  4.143954e-01    0.50230252  0.8452166
33  RPP1A   MSTRG.7 transcript  33  1.099668e-01    0.89197850  0.9937310
34  RPP1A   MSTRG.7 transcript  34  1.440411e+08    0.25745333  0.6535037
36  RPP1A   MSTRG.7 transcript  36  1.340859e-08    0.51091104  0.8510193
39  KCS1    MSTRG.1 transcript  39  5.171983e+00    0.01450149  0.4894779
41  CIPK9   AT1G01140   transcript  41  1.938509e+05    0.08195434  0.4894779
47  GIF2    MSTRG.15    transcript  47  3.886182e-01    0.31962657  0.7093423

and no transcripts showd qval<0.05, quite opposite from the https://bioinformatics.uconn.edu/rnaseq-arabidopsis/

ADD COMMENTlink modified 8 months ago • written 8 months ago by tiancaigg0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1906 users visited in the last hour