Retrieve count table from bam file using genomic coordinates
0
0
Entering edit mode
16 months ago
L_bioinfo • 0

I'm trying to match geneID of reference genome to the alignment file based on matching genomic coordinates in sam/bam file to that of reference genome annotation(GFF3). I need all the Gene ID that corresponds reads in the alignment file.

Thus far, I've tried feature counts and HTseq and I'm unable to process further. Kindly help me by giving suggestions.

featureCounts -F GFF -t gene -g ID  -a Genome_Concatenated.gff -o counts.txt sorted.bam

*****Error:  
Load annotation file Genome_Concatenated.gff 
ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..
The porgram has to terminate. *****! 



 Genome_Concatenated referenceToDocument: urn:local:.:t-fgyh5yz \ Concatenated sequence 1 108686 . + . Name=Scaffold0001 (concatenated sequence 1) ' 
    Genome_Concatenated Geneious extracted region 1 108686 . + . Name=;Extracted interval=1 -> 108%2C686' 
    Genome_Concatenated GeneMark.hmm start_codon 654 656 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"
    Genome_Concatenated GeneMark.hmm CDS 654 2849 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"
    Genome_Concatenated GeneMark.hmm stop_codon 2847 2849 . + 0 codon_start=1;GFF Attribute=gene_id "Gene0001";GFF Attribute=transcript_id "Transcript0001";GFF Attribute=protein_id "Protein0001"
Sequence Htseq-count feature analysis counts • 1.2k views
ADD COMMENT
0
Entering edit mode

Add more informations. Where did your GFF come from? Posting images is a bad practice. Instead copy your lines using the code formatter.

ADD REPLY
0
Entering edit mode

Apologies. I am a fresher to this page and hope to improve. I have made the changes suggested. Thanks .

ADD REPLY
0
Entering edit mode

No problems. Btw, your GTF (GFF3 is with "=", GTF with space like your example) has strange records inside.

Genome_Concatenated referenceToDocument: urn:local:.:t-fgyh5yz \ Concatenated sequence 1 108686 . + . Name=Scaffold0001 (concatenated sequence 1) '

Is this a line of your GTF file?

ADD REPLY
0
Entering edit mode

Yes it is. By this if you intend to suggest this could be one factor hampering feature counts from working, would it be alright to edit the GFF file?

ADD REPLY
1
Entering edit mode

Who knows. For sure this is concerning. Try to double check that your file is a regular GTF file (each line with 9 columns, tab separated; non-features line commented with '#' and placed at the beginning of the file) then try again. Instead of manually editing lines, investigate what has gone wrong with the way this file was generated.

ADD REPLY
0
Entering edit mode

What does your annotation file look like? Do you have the gene feature in your annotation file?

ADD REPLY
0
Entering edit mode

The last five lines is the sample of how my GFF file looks like. The file does have gene features

ADD REPLY
0
Entering edit mode

It looks like a mix of GFF and GTF.

ADD REPLY

Login before adding your answer.

Traffic: 1312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6