Retrieve count table from bam file using genomic coordinates
0
0
Entering edit mode
10 weeks ago
L_bioinfo • 0

I'm trying to match geneID of reference genome to the alignment file based on matching genomic coordinates in sam/bam file to that of reference genome annotation(GFF3). I need all the Gene ID that corresponds reads in the alignment file.

Thus far, I've tried feature counts and HTseq and I'm unable to process further. Kindly help me by giving suggestions.

featureCounts -F GFF -t gene -g ID  -a Genome_Concatenated.gff -o counts.txt sorted.bam

*****Error:
ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..
The porgram has to terminate. *****!

Genome_Concatenated referenceToDocument: urn:local:.:t-fgyh5yz \ Concatenated sequence 1 108686 . + . Name=Scaffold0001 (concatenated sequence 1) '
Genome_Concatenated Geneious extracted region 1 108686 . + . Name=;Extracted interval=1 -> 108%2C686'
Genome_Concatenated GeneMark.hmm start_codon 654 656 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"
Genome_Concatenated GeneMark.hmm CDS 654 2849 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"
Genome_Concatenated GeneMark.hmm stop_codon 2847 2849 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"

Sequence Htseq-count feature analysis counts • 682 views
0
Entering edit mode

Add more informations. Where did your GFF come from? Posting images is a bad practice. Instead copy your lines using the code formatter.

0
Entering edit mode

Apologies. I am a fresher to this page and hope to improve. I have made the changes suggested. Thanks .

0
Entering edit mode

No problems. Btw, your GTF (GFF3 is with "=", GTF with space like your example) has strange records inside.

Genome_Concatenated referenceToDocument: urn:local:.:t-fgyh5yz \ Concatenated sequence 1 108686 . + . Name=Scaffold0001 (concatenated sequence 1) '


Is this a line of your GTF file?

0
Entering edit mode

Yes it is. By this if you intend to suggest this could be one factor hampering feature counts from working, would it be alright to edit the GFF file?

1
Entering edit mode

Who knows. For sure this is concerning. Try to double check that your file is a regular GTF file (each line with 9 columns, tab separated; non-features line commented with '#' and placed at the beginning of the file) then try again. Instead of manually editing lines, investigate what has gone wrong with the way this file was generated.

0
Entering edit mode

What does your annotation file look like? Do you have the gene feature in your annotation file?

0
Entering edit mode

The last five lines is the sample of how my GFF file looks like. The file does have gene features

0
Entering edit mode

It looks like a mix of GFF and GTF.