I am working with 3' single cell expression data that has been generated on the 10x Chromium platform. The starting material is human cells that have been labeled with an RFP expressing lentiviral vector which also contains a ~50 BP long barcode cassette in the expressed transcript for lineage tracing purposes.
The barcode cassette (including linker sequence) is as follows:
Where BC14 is a 14 nucleotide long barcode and BC30 is a second 30 nucleotide long barcode. I am attempting to create a custom reference with cellranger mkref that includes the RFP sequence in addition to the cassette with all possible barcode combinations (about 50 million in total) as separate contigs. While I am not sure whether cellranger will accept a fasta file containing 50 million contigs that are ~70 BP long, I also need to generate a GTF file for input to mkref, as cellranger is a splicing-aware algorithm and will assign reads to intergenic, exonic, or intronic regions. While there are reference GTF files for most transgenes, including RFP, there are no such references for barcodes. The barcode containing transcripts are unspliced (e.g. fully exonic) so conceptually such a GTF should not be difficult to generate.
Practically speaking, what is the best way to generate this GTF file? I could make a BED file with start and end position of each contig (all should be the same size, only the name of the contig will be different), but are there tools (i.e. Bioconductor or elsewhere) to then convert this into GTF format, and specifying each one of these contigs as fully exonic? Thank you in advance for any help!