Question: Issue with creating table counts using StringTie *without* using a reference annotation file (GTF)
0
gravatar for catglen012
5 months ago by
catglen0120
catglen0120 wrote:

Hello,

I have been following the procedure given in the paper "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown" with my own examples of course.

1) I have sorted the SAM files to BAM just fine with no errors.

2) Then I assembled transcripts for each of my samples following the example:

stringtie -p 8 -G chrX_data/genes/chrX.gtf -o ERR188044_chrX.gtf –l ERR188044 ERR188044_chrX.bam

Below is what I used for my samples:

 stringtie GG1_hisat.sorted.bam -o GG1_hisat.sorted.gtf -m 300 -p5

I did not use the -G option because I don't have a reference annotation file yet.

3) Then I merged the transcripts of all my samples:

 stringtie --merge -o stringtie_merged.gtf gtf_files.txt

4) Now, I want to create a table of counts so that I can move on to use Deseq2 but I am unable to do so, and I am not sure why... maybe because I have not provided the -G option?

1st try:

stringtie –B -p8 -G stringtie_merged.gtf -o ballgown_Ceratodon.gtf

Error:

input file –B cannot be found!

2nd try:

stringtie -p8 -G stringtie_merged.gtf -o ballgown_Ceratodon.gtf

Error:

no input file provided!

rna-seq assembly • 580 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by catglen0120

New related-question can be found here:

Issue with DESeq2: Unable to create a DESeqDataSet because names in colData don't match column names in countData

ADD REPLYlink written 5 months ago by catglen0120
2
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

If your goal was to create a new, merged, transcriptome reference and to perform differential expression analysis on the transcripts identified in this, then use stringtie_merged.gtf and restart your analysis with BallGown and StringTie, where you will specify this file with the -G option. When using StringTie, ensure that you use the -e option. To create a counts matrix from StringTie that is suitable for DESeq2, you should then use the prepDE.py function: Using StringTie with DESeq2 and edgeR.

If you need a FASTA file relating to your merge GTF, you can produce that with the gffread program that come bundled with StringTie. Yo'ull also need a reference genome in FASTA.

Kevin

ADD COMMENTlink written 5 months ago by Kevin Blighe33k

Hello Kevin! I tried to use the command prepDE.py and it seems to not exist. I made sure to use python 2.7 but it just does not work.$ python prepDE.py python: can't open file 'prepDE.py': [Errno 2] No such file or directory Is there a way to go about this?

ADD REPLYlink written 5 months ago by catglen0120
1

Hello again. Can you try the following BASH command (execute it from the StringTie root directory):

find . -name "prepDE.py"

Does that find it?

ADD REPLYlink written 5 months ago by Kevin Blighe33k

yes! that solved the problem. Thank you Kevin.

ADD REPLYlink written 5 months ago by catglen0120

Nevermind, I tried finding more info about the command and the same thing appeared again. $ module load python $ module load stringtie $ find . -name "prepDE.py" $ python prepDE.py -h python: can't open file 'prepDE.py': [Errno 2] No such file or directory

ADD REPLYlink written 5 months ago by catglen0120

But, what is the output of this command?

find . -name "prepDE.py"
ADD REPLYlink written 5 months ago by Kevin Blighe33k

It just gives me a fresh line like a "$ " usually when I encounter errors I don't get a new line that starts with $ I get "python can't find open file"

ADD REPLYlink written 5 months ago by catglen0120
1

Oh, I see what is happening. You are using a cluster environment and loading StringTie and Python via module commands. However, you have to run the prepDE.py script by explicitly referencing the file with Python. It will be stored where the system administrator has stored StringTie. If you know where that is, you could look for it there, or you may have to submit a request to IT services about it.

Does that make sense?

So, the eventual future command would have to be something like:

python /shared/apps/stringtie/prepDE.py

It depends on where it was stored by the system administrator though.

ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe33k
1

Of course, you can possibly just download it to your home directory and run the script from there:

To download:

wget https://ccb.jhu.edu/software/stringtie/dl/prepDE.py
ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe33k
1

Recent versions of StringTie (e.g, the latest version, 1.3.4d) do not contain the prepDE.py script. I don't know if it is a bug, or if the script has been dropped intentionally. I think it has to be downloaded separately.

ADD REPLYlink written 5 months ago by h.mon21k

Thank you Kevin! I will download it using the script you provided.

ADD REPLYlink written 5 months ago by catglen0120

Hey kevin! I did the following:

$ wget https://ccb.jhu.edu/software/stringtie/dl/prepDE.py --2018-06-27 10:20:20-- https://ccb.jhu.edu/software/stringtie/dl/prepDE.py Resolving ccb.jhu.edu... 128.220.233.225 Connecting to ccb.jhu.edu|128.220.233.225|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 11095 (11K) [text/plain] Saving to: 'prepDE.py'

prepDE.py 100%[===========================================================>] 10.83K --.-KB/s in 0s

2018-06-27 10:20:20 (26.6 MB/s) - 'prepDE.py' saved [11095/11095]

$ python prepDE.py File "prepDE.py", line 32 print "Error: Text file with sample ID and path invalid (%s)" % (line.strip()) ^ SyntaxError: invalid syntax

I was wondering if you knew why this is happening and how to fix it?

ADD REPLYlink written 5 months ago by catglen0120
1

You will likely have to pass some arguments to the python prepDE.py command. Please take a look here: http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#deseq

ADD REPLYlink written 5 months ago by Kevin Blighe33k

I was able to obtain the output from prepDE.py and I was able to use this in rStudio:

> countData <- as.matrix(read.csv(file.choose(), row.names = "gene_id"))
> colData <- read.table(text = readLines(file.choose(), warn = FALSE), header = TRUE, sep = "," )

But when I check that all sample IDs in colData are also in CountData and match their orders, I obtain the following:

> all(rownames(colData) %in% colnames(countData))

[1] FALSE

the column names are in MSTRG instead of their respective IDs....

Even when I try to ignore this, I obtain the following:

> dds <- DESeqDataSetFromMatrix(countData = countData,  colData = colData, design = ~ CHOOSE_FEATURE)

Error in DESeqDataSet(se, design = design, ignoreRank) : all variables in design formula must be columns in colData

ADD REPLYlink modified 5 months ago • written 5 months ago by catglen0120

Could you open a new question for this, perhaps? I think that we have at least solved the StringTie / prepDE.py issue. I also ask because I will now be away for a couple of days, so, I am thinking that it would be better to get others to help too.

PS - also link to this old thread, too, in order to give others some context. I will nevertheless check back in a couple of days.

ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe33k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1094 users visited in the last hour