Question: Problems Using Htseq-Count And Samtools When Processing Rna-Seq Data
2
gravatar for camelbbs
6.8 years ago by
camelbbs650
China
camelbbs650 wrote:

HI, While I analyzed RNAseq data, I have some questions in using samtools and htseq-count. Can anyone help?

  1. I want to use htseq-count to get all counts on known genes, so i run:

htseq-count brain_fetus1.sam ~/knowngene_hg19.gff

Error occured in line 3 of file /cchome/che/knowngene_hg19.gff.

Error: Feature uc001aaa.3. does not contain a 'gene_id' attribute    

  [Exception type: SystemExit, raised in count.py:55]

The gff file like this:

chr1    hg19_knownGene    gene    11874    14409    0.000000    +    .    ID=uc001aaa.3;Name=
chr1    hg19_knownGene    mRNA    11874    14409    0.000000    +    .    ID=uc001aaa.3;Name=;Parent=uc001aaa.3
chr1    hg19_knownGene    exon    11874    12227    0.000000    +    .    ID=uc001aaa.3.;Name=;Parent=uc001aaa.3
chr1    hg19_knownGene    exon    12613    12721    0.000000    +    .    ID=uc001aaa.3.;Name=;Parent=uc001aaa.3
chr1    hg19_knownGene    exon    13221    14409    0.000000    +    .    ID=uc001aaa.3.;Name=;Parent=uc001aaa.3

Is there an appropriate tools to convert gtf to gff?

  1. another question is in samtools:

while i use samtools to figure out the counts in specified region, i run like this:

samtools mpileup -l test.bed brain.bam > test.txt

the test.bed file:

chr1    11873   14409   uc001aaa.3      0       +       11873   11873   0       3       354,109,1189,   0,739,1347,
chr1    11873   14409   uc010nxr.1      0       +       11873   11873   0       3       354,52,1189,    0,772,1347,
chr1    11873   14409   uc010nxq.1      0       +       12189   13639   0       3       354,127,1007,   0,721,1529,

It seems the -l option doesn't work. the result test.txt still contain the counts from the whole genome.

Thanks,

Che

rna-seq • 6.3k views
ADD COMMENTlink modified 6.8 years ago by Arun2.3k • written 6.8 years ago by camelbbs650
3
gravatar for Istvan Albert
6.8 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:
  1. make sure to set the -id attribute, by default your tool will use the gene_id See this: Counting reads in features with htseq-count
  2. I tried your samtools example and it worked as expected. Make sure your bamfile is sorted and indexed.
ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Istvan Albert ♦♦ 81k

Thanks. I think I got it.

ADD REPLYlink written 6.8 years ago by camelbbs650
1
gravatar for Arun
6.8 years ago by
Arun2.3k
Germany
Arun2.3k wrote:

Is there an appropriate tools to convert gtf to gff?

Try Genometools. After unzipping/installing, navigate to bin folder and type:

./gt gtf_to_gff3 -help
ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Arun2.3k

This tool cannot be compiled properly in my server.

ADD REPLYlink written 6.8 years ago by camelbbs650

In that case, I think you should ask your administrator to do it / help you with it or you should post here (in context of bioinformatics) or on stackoverflow explaining the output of the compilation error.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by Arun2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1417 users visited in the last hour