RNAseq
1
0
Entering edit mode
3.0 years ago

We want to identify lncrna from rainbow trout.(2 treatments,6sample;paired-end, illumina hiseq2500) After using fastQC an HISAT2, we used stringtie, then we used –merge and with gffcompare we tried to have our final annotated gtf file. I will have my codes below.(one sample)

java -jar trimmomatic-0.32.jar PE -threads 8 -phred33 /home/user/MahmoodPhd/Trimmomatic-master/R_19191_1.fq.gz /home/user/MahmoodPhd/Trimmomatic-master/R_19191_2.fq.gz /home/user/MahmoodPhd/Trimmomatic-master/R_19191_1_paired.fq.gz /home/user/MahmoodPhd/Trimmomatic-master/R_19191_2_unpaired.fq.gz /home/user/MahmoodPhd/Trimmomatic-master/R_19191_1_unpaired.fq.gz /home/user/MahmoodPhd/Trimmomatic-master/R_19191_2_paired.fq.gz  LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
gzip -d Oncorhynchus_mykiss.Omyk_1.0.dna.toplevel.fa.gz
mv GCF_002163495.1_Omyk_1.0_genomic.fna GCF_002163495.1_Omyk_1.0_genomic.fa
hisat2-build  GCF_002163495.1_Omyk_1.0_genomic.fa Omykindex
./hisat2 -t --known-splicesite-infile splicesites.txt --dta --summary-file -x Omykindex -1 R_19191_1.fq.gz, -2 R_19191_2.fq.gz -p 8 R_19191.sam
samtools sort -@ 8 -o R_19340_sort.bam R_19340.sam
./stringtie /home/user/MahmoodPhd/hisat2-2.1.0-Linux_x86_64/hisat2-2.1.0/R_19340_sort.bam -l R_19340 -p 8 -G GCF_002163495.1_Omyk_1.0_genomic.gff -o R_19340.gtf
./stringtie --merge -p 8 -G /home/user/MahmoodPhd/Stringtie/GCF_002163495.1_Omyk_1.0_genomic.gtf  -o stringtie_merged1.gtf list.txt
# ./stringtie --merge -p 8 -G /home/user/MahmoodPhd/Stringtie/GCF_002163495.1_Omyk_1.0_genomic.gtf -o stringtie_merged1.gtf list.txt
# StringTie version 2.1.4
stringtie_merged1.gtf:
NC_001717.1 StringTie   transcript  1004    16642   1000    +   .   gene_id "MSTRG.1"; transcript_id "unknown_transcript_1"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    1004    4890    1000    +   .   gene_id "MSTRG.1"; transcript_id "unknown_transcript_1"; exon_number "1"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    4958    6147    1000    +   .   gene_id "MSTRG.1"; transcript_id "unknown_transcript_1"; exon_number "2"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    6468    8018    1000    +   .   gene_id "MSTRG.1"; transcript_id "unknown_transcript_1"; exon_number "3"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    8094    8166    1000    +   .   gene_id "MSTRG.1"; transcript_id "unknown_transcript_1"; exon_number "4"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    8181    14770   1000    +   .   gene_id "MSTRG.1"; transcript_id "unknown_transcript_1"; exon_number "5"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    15361   16642   1000    +   .   gene_id "MSTRG.1"; transcript_id "unknown_transcript_1"; exon_number "6"; ref_gene_id ""; 
NC_001717.1 StringTie   transcript  4888    16642   1000    -   .   gene_id "MSTRG.2"; transcript_id "unknown_transcript_1"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    4888    4958    1000    -   .   gene_id "MSTRG.2"; transcript_id "unknown_transcript_1"; exon_number "1"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    6149    6291    1000    -   .   gene_id "MSTRG.2"; transcript_id "unknown_transcript_1"; exon_number "2"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    6329    6466    1000    -   .   gene_id "MSTRG.2"; transcript_id "unknown_transcript_1"; exon_number "3"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    8019    8089    1000    -   .   gene_id "MSTRG.2"; transcript_id "unknown_transcript_1"; exon_number "4"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    14767   15357   1000    -   .   gene_id "MSTRG.2"; transcript_id "unknown_transcript_1"; exon_number "5"; ref_gene_id ""; 
NC_001717.1 StringTie   exon    16573   16642   1000    -   .   gene_id "MSTRG.2"; transcript_id "unknown_transcript_1"; exon_number "6"; ref_gene_id ""; 
NC_035077.1 StringTie   transcript  26931   50257   1000    +   .   gene_id "MSTRG.3"; transcript_id "XM_021602508.1"; gene_name "LOC110523613"; ref_gene_id "LOC110523613"; 
NC_035077.1 StringTie   exon    26931   27080   1000    +   .   gene_id "MSTRG.3"; transcript_id "XM_021602508.1"; exon_number "1"; gene_name "LOC110523613"; ref_gene_id "LOC110523613"; 
NC_035077.1 StringTie   exon    33577   33719   1000    +   .   gene_id "MSTRG.3"; transcript_id "XM_021602508.1"; exon_number "2"; gene_name "LOC110523613"; ref_gene_id "LOC110523613"; 
NC_035077.1 StringTie   exon    34556   34671   1000    +   .   gene_id "MSTRG.3"; transcript_id "XM_021602508.1"; exon_number "3"; gene_name "LOC110523613"; ref_gene_id "LOC110523613"; 
NC_035077.1 StringTie   exon    40440   40620   1000    +   .   gene_id "MSTRG.3"; transcript_id "XM_021602508.1"; exon_number "4"; gene_name "LOC110523613"; ref_gene_id "LOC110523613"; 
NC_035077.1 StringTie   exon    47972   50257   1000    +   .   gene_id "MSTRG.3"; transcript_id "XM_021602508.1"; exon_number "5"; gene_name "LOC110523613"; ref_gene_id "LOC110523613";
…..
./gffcompare -r /home/user/MahmoodPhd/Stringtie/GCF_002163495.1_Omyk_1.0_genomic.gtf -G -o finalgffestimation.gtf /home/user/MahmoodPhd/Stringtie/stringtie_merged1.gtf
Our problem is that our last gtf file (finalgffestimation.gtf) is like below:  when I filter out size (200nt) and class codes(uiojx) and then merge them (with cat) I cannot convert it to FASTA for downstream analysis.
finalgffestimation.gtf: (exon numbers are seen in different row and I can filter out correctly)
NC_001717.1 StringTie   transcript  1004    16642   .   +   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.1"; xloc "XLOC_000001"; ref_gene_id ""; cmp_ref "unknown_transcript_1"; class_code "j"; tss_id "TSS1";
NC_001717.1 StringTie   exon    1004    4890    .   +   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.1"; exon_number "1";
NC_001717.1 StringTie   exon    4958    6147    .   +   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.1"; exon_number "2";
NC_001717.1 StringTie   exon    6468    8018    .   +   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.1"; exon_number "3";
NC_001717.1 StringTie   exon    8094    8166    .   +   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.1"; exon_number "4";
NC_001717.1 StringTie   exon    8181    14770   .   +   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.1"; exon_number "5";
NC_001717.1 StringTie   exon    15361   16642   .   +   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.1"; exon_number "6";
NC_001717.1 StringTie   transcript  4888    16642   .   -   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.2"; xloc "XLOC_000002"; ref_gene_id ""; cmp_ref "unknown_transcript_1"; class_code "j"; tss_id "TSS2";
NC_001717.1 StringTie   exon    4888    4958    .   -   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.2"; exon_number "1";
NC_001717.1 StringTie   exon    6149    6291    .   -   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.2"; exon_number "2";
NC_001717.1 StringTie   exon    6329    6466    .   -   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.2"; exon_number "3";
NC_001717.1 StringTie   exon    8019    8089    .   -   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.2"; exon_number "4";
NC_001717.1 StringTie   exon    14767   15357   .   -   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.2"; exon_number "5";
NC_001717.1 StringTie   exon    16573   16642   .   -   .   transcript_id "unknown_transcript_1"; gene_id "MSTRG.2"; exon_number "6";
NC_035077.1 StringTie   transcript  26931   50257   .   +   .   transcript_id "XM_021602508.1"; gene_id "MSTRG.3"; gene_name "LOC110523613"; xloc "XLOC_000003"; ref_gene_id "LOC110523613"; cmp_ref "XM_021602508.1"; class_code "="; tss_id "TSS3";
NC_035077.1 StringTie   exon    26931   27080   .   +   .   transcript_id "XM_021602508.1"; gene_id "MSTRG.3"; exon_number "1";
NC_035077.1 StringTie   exon    33577   33719   .   +   .   transcript_id "XM_021602508.1"; gene_id "MSTRG.3"; exon_number "2";
NC_035077.1 StringTie   exon    34556   34671   .   +   .   transcript_id "XM_021602508.1"; gene_id "MSTRG.3"; exon_number "3";
NC_035077.1 StringTie   exon    40440   40620   .   +   .   transcript_id "XM_021602508.1"; gene_id "MSTRG.3"; exon_number "4";
NC_035077.1 StringTie   exon    47972   50257   .   +   .   transcript_id "XM_021602508.1"; gene_id "MSTRG.3"; exon_number "5";
NC_035077.1 StringTie   transcript  32704   50257   .   +   .   transcript_id "MSTRG.3.2"; gene_id "MSTRG.3"; gene_name "LOC110523613"; xloc "XLOC_000003"; cmp_ref "XM_021602508.1"; class_code "j"; tss_id "TSS4";
NC_035077.1 StringTie   exon    32704   33070   .   +   .   transcript_id "MSTRG.3.2"; gene_id "MSTRG.3"; exon_number "1";
NC_035077.1 StringTie   exon    40440   40620   .   +   .   transcript_id "MSTRG.3.2"; gene_id "MSTRG.3"; exon_number "2";
NC_035077.1 StringTie   exon    47972   50257   .   +   .   transcript_id "MSTRG.3.2"; gene_id "MSTRG.3"; exon_number "3";
NC_035077.1 StringTie   transcript  145370  152927  .   +   .   transcript_id "XM_021602762.1"; gene_id "MSTRG.5"; gene_name "LOC110523749"; xloc "XLOC_000004"; ref_gene_id "LOC110523749"; cmp_ref "XM_021602762.1"; class_code "="; tss_id "TSS5";
NC_035077.1 StringTie   exon    145370  145969  .   +   .   transcript_id "XM_021602762.1"; gene_id "MSTRG.5"; exon_number "1";
NC_035077.1 StringTie   exon    146059  146626  .   +   .   transcript_id "XM_021602762.1"; gene_id "MSTRG.5"; exon_number "2";
NC_035077.1 StringTie   exon    146738  152927  .   +   .   transcript_id "XM_021602762.1"; gene_id "MSTRG.5"; exon_number "3";
NC_035077.1 Gnomon  transcript  177504  190839  .   +   .   transcript_id "XM_021603017.1"; gene_id "LOC110523873"; gene_name "LOC110523873"; xloc "XLOC_000005"; ref_gene_id "LOC110523873"; cmp_ref "XM_021603017.1"; class_code "="; tss_id "TSS6";
NC_035077.1 Gnomon  exon    177504  177974  .   +   .   transcript_id "XM_021603017.1"; gene_id "LOC110523873"; exon_number "1";
NC_035077.1 Gnomon  exon    190591  190839  .   +   .   transcript_id "XM_021603017.1"; gene_id "LOC110523873"; exon_number "2";
NC_035077.1 Gnomon  transcript  201613  226609  .   +   .   transcript_id "XM_021602890.1"; gene_id "LOC110523811"; gene_name "LOC110523811"; xloc "XLOC_000006"; ref_gene_id "LOC110523811"; cmp_ref "XM_021602890.1"; class_code "="; tss_id "TSS7";
NC_035077.1 Gnomon  exon    201613  202200  .   +   .   transcript_id "XM_021602890.1"; gene_id "LOC110523811"; exon_number "1";
NC_035077.1 Gnomon  exon    206366  206502  .   +   .   transcript_id "XM_021602890.1"; gene_id "LOC110523811"; exon_number "2";
NC_035077.1 Gnomon  exon    209520  209564  .   +   .   transcript_id "XM_021602890.1"; gene_id "LOC110523811"; exon_number "3";
NC_035077.1 Gnomon  exon    209917  209985  .   +   .   transcript_id "XM_021602890.1"; gene_id "LOC110523811"; exon_number "4";
NC_035077.1 Gnomon  exon    217023  217200  .   +   .   transcript_id "XM_021602890.1"; gene_id "LOC110523811"; exon_number "5";
NC_035077.1 Gnomon  exon    226478  226609  .   +   .   transcript_id "XM_021602890.1"; gene_id "LOC110523811"; exon_number "6";
NC_035077.1 BestRefSeq  transcript  355926  382310  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; gene_name "LOC100135976"; xloc "XLOC_000007"; ref_gene_id "LOC100135976"; cmp_ref "NM_001124315.1"; class_code "="; tss_id "TSS8";
NC_035077.1 BestRefSeq  exon    355926  356160  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "1";
NC_035077.1 BestRefSeq  exon    365943  366101  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "2";
NC_035077.1 BestRefSeq  exon    367207  367323  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "3";
NC_035077.1 BestRefSeq  exon    368982  369110  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "4";
NC_035077.1 BestRefSeq  exon    370615  370674  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "5";
NC_035077.1 BestRefSeq  exon    374855  375049  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "6";
NC_035077.1 BestRefSeq  exon    375557  375658  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "7";
NC_035077.1 BestRefSeq  exon    375795  375901  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "8";
NC_035077.1 BestRefSeq  exon    379788  382310  .   +   .   transcript_id "NM_001124315.1"; gene_id "LOC100135976"; exon_number "9";
NC_035077.1 Gnomon  transcript  422138  465051  .   +   .   transcript_id "XM_021601631.1"; gene_id "LOC110523159"; gene_name "LOC110523159"; xloc "XLOC_000008"; ref_gene_id "LOC110523159"; cmp_ref "XM_021601631.1"; class_code "="; tss_id "TSS9";
NC_035077.1 Gnomon  exon    422138  422304  .   +   .   transcript_id "XM_021601631.1"; gene_id "LOC110523159"; exon_number "1";
NC_035077.1 Gnomon  exon    430842  430903  .   +   .   transcript_id "XM_021601631.1"; gene_id "LOC110523159"; exon_number "2";
NC_035077.1 Gnomon  exon    431903  431943  .   +   .   transcript_id "XM_021601631.1"; gene_id "LOC110523159"; exon_number "3";
NC_035077.1 Gnomon  exon    452095  452215  .   +   .   transcript_id "XM_021601631.1"; gene_id "LOC110523159"; exon_number "4";
NC_035077.1 Gnomon  exon    456772  456888  .   +   .   transcript_id "XM_021601631.1"; gene_id "LOC110523159"; exon_number "5";
NC_035077.1 Gnomon  exon    457131  457323  .   +   .   transcript_id "XM_021601631.1"; gene_id "LOC110523159"; exon_number "6";
NC_035077.1 Gnomon  exon    465044  465051  .   +   .   transcript_id "XM_021601631.1"; gene_id "LOC110523159"; exon_number "7";
….
Codes for filtring
awk '{ if ($5-$4>200) print $0 }'  /home/user/MahmoodPhd/FEELnc/merged.annotated.gtf > merged.annotated_200.gtf
awk '$20 ~ /"x"/ { print }' '/home/user/MahmoodPhd/merged.annotated_200.gtf' > x20.gtf
cat u16.gtf j20.gtf i20.gtf o20.gtf x20.gtf > ujiox.gtf
gffread /home/user/MahmoodPhd/cuffcom/ujiox.gtf  -g /home/user/MahmoodPhd/files/GCF_002163495.1_Omyk_1.0_genomic.fa -w /home/user/MahmoodPhd/ujiox.fasta
ujiox.fasta is empty!!
stringtie rnaseq error gffcompare • 1.1k views
ADD COMMENT
1
Entering edit mode
3.0 years ago

Hi, can you please choose a more informative question title?

Also, can you take a look at how I use gffread, here: Cufflinks gffread utility

It would additionally help to provide a minimal reproducible example, if even by creating 'fake' GTF and reference FASTA entries. This would help because we cannot see the contents of all of your files. By doing this, you may also indirectly solve your problem on your own. Thanks!

ADD COMMENT
1
Entering edit mode

Thanks my friend

ADD REPLY
0
Entering edit mode

De nada amigo. No hay de que.

ADD REPLY

Login before adding your answer.

Traffic: 2788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6