Help in Finding Littorina Saxatilis GFF or GTF file
2
0
Entering edit mode
8.0 years ago
Adam • 0

I'm trying to use the DESeq2 pipeline with the Littorina Saxatilis organism through the usegalaxy.org website, which requires a GTF or GFF file for the htseq-count step of the pipeline. I have the alignment files, but to use htseq-count, I need a feature ID (the GFF or GTF file). I'm completely stuck, and am unable to find such a file. I would appreciate any help.

rna-seq • 1.5k views
ADD COMMENT
0
Entering edit mode

I am going to hazard a guess that the alignments referred to by @adam may be against the LSD EST database. There is some indication of a genome being put together but nothing seems to be available publicly.
If that is the case then @adam may need to count the reads against the EST's.

ADD REPLY
2
Entering edit mode
8.0 years ago
Michael 54k

There doesn't seem to be a sequenced genome of this organism in the public databases, and therefore there will not be a genome annotation for it. The question is, if there is no genome, where do the alignments come from? Maybe it is from a transcriptome assembly. You should ask the persons who gave you the data what it actually is, maybe they have a non public genome.

Edit: The SAM header should contain some additional information on the reference sequence used. You can use samtools to show the header of your files.

ADD COMMENT
2
Entering edit mode
8.0 years ago
igor 13k

I agree with Michael, but I would add that you need to ask them for the reference genome regardless of whether the sequence is public or not. Your GTF/GFF has to correspond to the exact reference genome that was used for alignment. Even if there is a public reference, there is no guarantee that is the one that was used for the alignment, especially for more obscure species. If you don't know exactly what reference was used for alignment, do not try to guess what GTF/GFF file to use.

Technically, you could try to figure out manually if the GTF/GFF you found on your own is correct, but that is a more difficult task than the one you are asking about.

Another option is to check the BAM file header: samtools view -H file.bam Depending on the aligner, the reference FASTA file is likely to be listed there. If you are lucky, the file name will have the some sort of a sequence identifier. If you search by that, you may be able to find the corresponding GTF/GFF files.

ADD COMMENT
2
Entering edit mode

I agree with you, the person who started the study and who ran the alignment should just provide all the necessary information for the analysis. If in doubt, checking the SAM/BAM header might also help to find out the reference sequence.

ADD REPLY
2
Entering edit mode

I was actually just editing my answer to add the BAM header part.

ADD REPLY

Login before adding your answer.

Traffic: 2360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6