Question: Error message whengenerating read coverage tables for Ballgown by stringtie
0
gravatar for hu.guangzhen9
19 months ago by
hu.guangzhen90 wrote:

Background Differential expression analysis workflow by stringtie includes the following steps: 1. for each RNA-Seq sample, map the reads to the genome with HISAT2 using the --dta option. It is highly recommended to use the reference annotation information when mapping the reads, which can be either embedded in the genome index (built with the --ss and --exon options, see HISAT2 manual), or provided separately at run time (using the --known-splicesite-infile option of HISAT2). The SAM output of each HISAT2 run must be sorted and converted to BAM using samtools as explained above. 2. for each RNA-Seq sample, run StringTie to assemble the read alignments obtained in the previous step; it is recommended to run StringTie with the -G option if the reference annotation is available. 3. run StringTie with --merge in order to generate a non-redundant set of transcripts observed in all the RNA-Seq samples assembled previously. The stringtie --merge mode takes as input a list of all the assembled transcripts files (in GTF format) previously obtained for each sample, as well as a reference annotation file (-G option) if available. 4. for each RNA-Seq sample, run StringTie using the -B/-b and -e options in order to estimate transcript abundances and generate read coverage tables for Ballgown. The -e option is not required but recommended for this run in order to produce more accurate abundance estimations of the input transcripts. Each StringTie run in this step will take as input the sorted read alignments (BAM file) obtained in step 1 for the corresponding sample and the -G option with the merged transcripts (GTF file) generated by stringtie --merge in step 3. Please note that this is the only case where the -G option is not used with a reference annotation, but with the global, merged set of transcripts as observed across all samples. (This step is the equivalent of the Tablemaker step described in the original Ballgown pipeline.) 5 Ballgown can now be used to load the coverage tables generated in the previous step and perform various statistical analyses for differential expression, generate plots etc.

I am at step 4, running StringTie using the -B/-b and -e options in order to estimate transcript abundances and generate read coverage tables for Ballgown.

./stringtie Path/sample1.stringtie.gtf Path/sample2.stringtie.gtf -G Path/samples.merged.stringtie.gtf -B -e -o Path/sample.stringtie.coveragetable.gtf

Error message: [samopen] no @SQ lines in the header. [sam_read1] missing header? Abort!

Can you give some suggestion on how to solve the problem?

Thanks!

Guangzhen

rna-seq • 946 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by hu.guangzhen90
0
gravatar for hu.guangzhen9
19 months ago by
hu.guangzhen90 wrote:

Here is the head of one file

./stringtie Path/sample_sorted.bam -G Path/hg38.reference.GTF -o Path/sample.ref.stringtie.gtf -A sample.abund.out

StringTie version 1.2.3

chr1 StringTie transcript 14362 29370 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; reference_id "NR_024540"; ref_gene_id "NR_024540"; cov "18.600962"; FPKM "4.246014"; TPM "8.169806"; chr1 StringTie exon 14362 14829 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; reference_id "NR_024540"; ref_gene_id "NR_024540"; cov "20.951817"; chr1 StringTie exon 14970 15038 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "2"; reference_id "NR_024540"; ref_gene_id "NR_024540"; cov "20.590321";

Here is the head of the reference file

./stringtie --merge -G Path/hg38.reference.GTF -o Path/Samples.merged.stringtie.gtf Path/sample.ref.stringtie.gtf .....

StringTie version 1.2.3

chr1 StringTie transcript 11874 14409 1000 + . gene_id "MSTRG.1"; transcript_id "NR_046018"; ref_gene_id "NR_046018"; chr1 StringTie exon 11874 12227 1000 + . gene_id "MSTRG.1"; transcript_id "NR_046018"; exon_number "1"; ref_gene_id "NR_046018"; chr1 StringTie exon 12613 12721 1000 + . gene_id "MSTRG.1"; transcript_id "NR_046018"; exon_number "2"; ref_gene_id "NR_046018";

Is there any way to correct the head of sample file and the head of the reference file?

Thanks!

ADD COMMENTlink written 19 months ago by hu.guangzhen90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 722 users visited in the last hour