Background Differential expression analysis workflow by stringtie includes the following steps: 1. for each RNA-Seq sample, map the reads to the genome with HISAT2 using the --dta option. It is highly recommended to use the reference annotation information when mapping the reads, which can be either embedded in the genome index (built with the --ss and --exon options, see HISAT2 manual), or provided separately at run time (using the --known-splicesite-infile option of HISAT2). The SAM output of each HISAT2 run must be sorted and converted to BAM using samtools as explained above. 2. for each RNA-Seq sample, run StringTie to assemble the read alignments obtained in the previous step; it is recommended to run StringTie with the -G option if the reference annotation is available. 3. run StringTie with --merge in order to generate a non-redundant set of transcripts observed in all the RNA-Seq samples assembled previously. The stringtie --merge mode takes as input a list of all the assembled transcripts files (in GTF format) previously obtained for each sample, as well as a reference annotation file (-G option) if available. 4. for each RNA-Seq sample, run StringTie using the -B/-b and -e options in order to estimate transcript abundances and generate read coverage tables for Ballgown. The -e option is not required but recommended for this run in order to produce more accurate abundance estimations of the input transcripts. Each StringTie run in this step will take as input the sorted read alignments (BAM file) obtained in step 1 for the corresponding sample and the -G option with the merged transcripts (GTF file) generated by stringtie --merge in step 3. Please note that this is the only case where the -G option is not used with a reference annotation, but with the global, merged set of transcripts as observed across all samples. (This step is the equivalent of the Tablemaker step described in the original Ballgown pipeline.) 5 Ballgown can now be used to load the coverage tables generated in the previous step and perform various statistical analyses for differential expression, generate plots etc.
I am at step 4, running StringTie using the -B/-b and -e options in order to estimate transcript abundances and generate read coverage tables for Ballgown.
./stringtie Path/sample1.stringtie.gtf Path/sample2.stringtie.gtf -G Path/samples.merged.stringtie.gtf -B -e -o Path/sample.stringtie.coveragetable.gtf
Error message: [samopen] no @SQ lines in the header. [sam_read1] missing header? Abort!
Can you give some suggestion on how to solve the problem?