STAR Indexing problems with unassambled vertebrate genomes
1
0
Entering edit mode
6.8 years ago

Hi

I am trying to index Balaenoptera_acutorostrata genome (Refseq:https://goo.gl/SHF2dz , Genebank: https://goo.gl/JZqPHs ) using STAR, but it is not really working. I have followed the same approach that I followed for Camelus dromedarius (that worked properly) using the parameter --sjdbGTFtagExonParentTranscript Parent but it seems that Balaenoptera_acutorostrata has a weird gff file where there is not Parent assignation. A mess

So then, I decided to use the gff3 files from the scaffolds (it didn't work again). So I decided to convert the gff to gtf using the cufflinks command : gffread myfile.gff3 -T -o output.gtf (the gtf seems ok to me, however there is not chromosome names ... but it is because it has not been assembled)

It is neither working, so then I have just indexed without gtf or gff, and it works but later in the mapping I have the same problem. I am getting crazy with this, I have tried I think almost everything. I guess I have some sort of incompatibility between the fasta and the annotation file.

Any help will be highly appreciated.

Thanks!

RNA-Seq STAR GTF • 1.6k views
ADD COMMENT
0
Entering edit mode

You'll need to be a bit more precise on which commands you used and which errors you obtained. You are making it hard to troubleshoot.

ADD REPLY
3
Entering edit mode
6.8 years ago

I'd make a STAR reference without the GTF/gff3 annotation included. These formats are notoriously difficult. However, the main reason is that if your genome is in scaffolds, the annotation is unlikely to be great. If you include the annotation in the index, STAR will be biased towards your annotation. I used to work on many crops and used STAR very extensively and successfully using the 2-pass mode. Read up on that in the STAR manual and give it a go would be my advice.

I see you tried this as well, but my above advice still stands. If you have a lot of contigs STAR needs additional settings to cope (this answer has been given many times in the STAR google group by the author, have a look there). One workaround if you are hardware limited is to combine many tiny contigs into one "pseudochromosome" to make reference indexing and fast mapping more tractable for the tool.

Good luck.

ADD COMMENT
0
Entering edit mode

Thanks a lot Colin! I will check in the google group. And yes... I think the best is to index without gtf annotation.

:-)

ADD REPLY

Login before adding your answer.

Traffic: 2442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6