Question

How to get raw read counts of genes/cds/transcripts from SAM/BAM file?

0

Entering edit mode

3.1 years ago

shail.nair05 ▴ 20

I have metatranscriptomics reads of environmental samples. The aim of this is to find highly expressed/metabolic functional transcripts. I did the preprocessing (adapter/rrna/host reads removal from the reads), Assembly via rnaspades, of prediction via prodigal and annotation via ghostkoala (KEGG Orthologs) and eggnog -mapper. Similarly, I mapped the assembly to the raw reads via bowtie2 and stored and indexed the BAM files via samtools.

From here, how do I count the number of raw reads taken by all annotated gene/orf within the sample and later convert it to TPM?

Note: I tried to feed the gff file from PRODIGAL and sorted bam files in featurecounts (subreads package) but the command gave an error saying no features were loaded in format GTF.

RNA-Seq alignment gene sequencing assembly • 1.1k views

ADD COMMENT • link 3.1 years ago by shail.nair05 ▴ 20

0

Entering edit mode

featurecounts (subreads package) but the command gave an error saying no features were loaded in format GTF.

The post should be about solving this error because featureCounts should, in principle, do what you're trying to achieve. The first thing to check is whether the names of the chromosomes in the GFF file are the same as the names in the BAM files, typicall differences are that one file may indicated the chromosomes just by integers whereas the other file may use "chr1", "chr2" and so on. Also, make sure that you read the documentation of feature counts carefully to understand what the format of the GTF/GFF file should be.

ADD REPLY • link 3.1 years ago by Friederike 8.9k

score 1 · Answer 1 · 2021-03-22

1

Entering edit mode

3.1 years ago

shail.nair05 ▴ 20

Friederike Silly mistake.The problem was with my command for FEATURECOUNTS. I corrected it. Now featurecount can read my gff and indexed bam file. Thanks

ADD COMMENT • link 3.1 years ago by shail.nair05 ▴ 20