Question

Help in Finding Littorina Saxatilis GFF or GTF file

0

Entering edit mode

9.6 years ago

Adam • 0

I'm trying to use the DESeq2 pipeline with the Littorina Saxatilis organism through the usegalaxy.org website, which requires a GTF or GFF file for the htseq-count step of the pipeline. I have the alignment files, but to use htseq-count, I need a feature ID (the GFF or GTF file). I'm completely stuck, and am unable to find such a file. I would appreciate any help.

rna-seq • 1.9k views

ADD COMMENT • link updated 9.6 years ago by igor 13k • written 9.6 years ago by Adam • 0

0

Entering edit mode

I am going to hazard a guess that the alignments referred to by @adam may be against the LSD EST database. There is some indication of a genome being put together but nothing seems to be available publicly.
If that is the case then @adam may need to count the reads against the EST's.

ADD REPLY • link 9.6 years ago by GenoMax 154k

score 2 · Answer 1 · 2016-04-24

There doesn't seem to be a sequenced genome of this organism in the public databases, and therefore there will not be a genome annotation for it. The question is, if there is no genome, where do the alignments come from? Maybe it is from a transcriptome assembly. You should ask the persons who gave you the data what it actually is, maybe they have a non public genome.

Edit: The SAM header should contain some additional information on the reference sequence used. You can use samtools to show the header of your files.

score 2 · Answer 2 · 2016-04-24

I agree with Michael, but I would add that you need to ask them for the reference genome regardless of whether the sequence is public or not. Your GTF/GFF has to correspond to the exact reference genome that was used for alignment. Even if there is a public reference, there is no guarantee that is the one that was used for the alignment, especially for more obscure species. If you don't know exactly what reference was used for alignment, do not try to guess what GTF/GFF file to use.

Technically, you could try to figure out manually if the GTF/GFF you found on your own is correct, but that is a more difficult task than the one you are asking about.

Another option is to check the BAM file header: samtools view -H file.bam Depending on the aligner, the reference FASTA file is likely to be listed there. If you are lucky, the file name will have the some sort of a sequence identifier. If you search by that, you may be able to find the corresponding GTF/GFF files.