Hi,
I have RNA-seq data of HIV infected cells, which I now want to map to a mixed human-HIV genome. For the creation of that genome, I need the GTF file of my HIV strand. I din't find strain specific annotation files for HIV. Do you maybe know where one could find something like that, or a better way to evaluate transcript abundance of HIV in RNA-seq data?
Ok I thought I could convert my annotations in genius by hand in the gff text file, to convert it to a GTF file, but I am very uncertain, if my annotations a sufficient for that.
My GFF file looks like this:
pNL4-3 Geneious region 1 9709 . + 0 Is_circular=true
pNL4-3 Geneious insertion 1186 1186 . + . Name=p17/p24
pNL4-3 Geneious polyA_signal 9602 9607 . + . Name=POLY_A
pNL4-3 Geneious LTR 9076 9709 . + . Name=3'_LTR
pNL4-3 Geneious LTR 1 634 . + . Name=5'_LTR
pNL4-3 Geneious invisible_Parent 8888 15012 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346934513538.20
pNL4-3 Geneious invisible_Parent 5304 8887 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346934513382.19
pNL4-3 Geneious misc_feature 5005 5034 . . . Name=Fragment3
pNL4-3 Geneious misc_feature 5743 5744 . + . Name=JNCTN_NY5/LAV
pNL4-3 Geneious repeat_region 454 551 . + . Name=R
pNL4-3 Geneious repeat_region 9529 9626 . + . Name=R
pNL4-3 Geneious repeat_region 552 634 . + . Name=U5
pNL4-3 Geneious intron 744 5776 . + . Name=TAT/REV/NEF_I
pNL4-3 Geneious intron 6045 8368 . + . Name=TAT_II
pNL4-3 Geneious intron 6045 8368 . + . Name=TAT/REV/NEF_II
pNL4-3 Geneious intron 6045 8368 . + . Name=REV_II
pNL4-3 Geneious CDS 2085 5096 . + . Name=POL
pNL4-3 Geneious CDS 5969 8643 . + . Name=REV
pNL4-3 Geneious CDS 5830 8414 . + . Name=TAT
pNL4-3 Geneious CDS 6221 8785 . + . Name=ENV
pNL4-3 Geneious CDS 790 2292 . + . Name=GAG
pNL4-3 Geneious CDS 8787 9407 . + . Name=NEF
pNL4-3 Geneious CDS 5041 5619 . + . Name=VIF
pNL4-3 Geneious CDS 5559 5849 . + . Name=VPR
pNL4-3 Geneious CDS 6061 6306 . + . Name=VPU
pNL4-3 Geneious splicing signal 5059 5060 . + . Name=SD2b
pNL4-3 Geneious splicing signal 4963 4964 . + . Name=SD2
pNL4-3 Geneious splicing signal 5974 5975 . + . Name=SA5
pNL4-3 Geneious splicing signal 6720 6721 . + . Name=(SD5)
pNL4-3 Geneious splicing signal 744 745 . + . Name=SD1
pNL4-3 Geneious splicing signal 6045 6046 . + . Name=SD4
pNL4-3 Geneious splicing signal 5388 5389 . + . Name=SA2
pNL4-3 Geneious splicing signal 8367 8368 . + . Name=SA7
pNL4-3 Geneious splicing signal 5464 5465 . + . Name=SD3
pNL4-3 Geneious splicing signal 5775 5776 . + . Name=SA3
pNL4-3 Geneious splicing signal 5952 5953 . + . Name=SA4a
pNL4-3 Geneious splicing signal 5934 5935 . + . Name=SA4c
pNL4-3 Geneious splicing signal 5958 5959 . + . Name=SA4b
pNL4-3 Geneious splicing signal 4911 4912 . + . Name=SA1
pNL4-3 Geneious splicing signal 6602 6603 . + . Name=(SA6)
pNL4-3 Geneious invisible_Parent 5786 7812 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938163915.21
pNL4-3 Geneious invisible_Parent 7813 15494 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938164320.22
pNL4-3 Geneious invisible_Parent 5304 7812 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938256243.23
pNL4-3 Geneious invisible_Parent 7813 15012 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938256306.24
pNL4-3 Geneious invisible_Parent 639 5785 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938340538.25
pNL4-3 Geneious invisible_Parent 5786 10347 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938340632.26
pNL4-3 Geneious invisible_Parent 639 5303 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938465400.27
pNL4-3 Geneious invisible_Parent 5304 10347 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938465494.28
pNL4-3 Geneious invisible_Parent 712 5785 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938540694.29
pNL4-3 Geneious invisible_Parent 5786 10420 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938540787.30
pNL4-3 Geneious invisible_Parent 712 5303 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938613678.31
pNL4-3 Geneious invisible_Parent 5304 10420 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346938613756.32
pNL4-3 Geneious invisible_Parent 5786 8465 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346941525286.33
pNL4-3 Geneious invisible_Parent 8466 15494 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346941525411.34
pNL4-3 Geneious invisible_Parent 5304 8465 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346941600376.35
pNL4-3 Geneious invisible_Parent 8466 15012 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1346941600470.36
pNL4-3 Geneious invisible_Parent 712 5303 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1347027336363.0
pNL4-3 Geneious invisible_Parent 5304 10420 . + . Name=GvHzdFvgSWWztDH65o8llFeG9ws.1347027336628.1
Do I have to look up the exon borders and insert them manually? Shoudl I delete the first line and do I have to delete the splice signal entries?
Do you maybe know another way to get to e.g. an exemplary HIV GTF file fro comaprison? Or even the one I need?
If you have the genbank file you could try using a genbank2gtf type program to make one up. Here is one repo.
Thank you! I have the annotation in genious and can download the GFF file from there. I just have to convert it then, which I guess can be done by hand, since the file is not that large.
Hi, caggtaagtat ,
I wonder if your HIV NL4-3 GFF/GTF file works? I have the same question and I could not find GFF/GTF of NL4-3 despite intensive google search.
Best,
Xiao
Sequence for HIV NL4-3 is available here. You could download the genbank format file and then try to make the GTF file.
The GTF file should contain all the transcripts of NL4-3, not just the DNA sequences. There are no such annotations of NL4-3 transcripts on the Internet.
just wanted to follow up that to address this specific issue we just released HIV transcriptome annotations of alternative splicing featuring all major donors and acceptors (full description below): https://ccb.jhu.edu/HIV_Atlas/ . Hope this helps!