Creating custom GTF file for use with Cellranger with barcode sequences
0
0
Entering edit mode
8 months ago

Hello all

I am working with 3' single cell expression data that has been generated on the 10x Chromium platform. The starting material is human cells that have been labeled with an RFP expressing lentiviral vector which also contains a ~50 BP long barcode cassette in the expressed transcript for lineage tracing purposes.

The barcode cassette (including linker sequence) is as follows:

BC14(sense)-TGCTCAGGTAGCCTCACCTCC-BC30(sense)-3LTR-Poly(A)signal

Where BC14 is a 14 nucleotide long barcode and BC30 is a second 30 nucleotide long barcode. I am attempting to create a custom reference with cellranger mkref that includes the RFP sequence in addition to the cassette with all possible barcode combinations (about 50 million in total) as separate contigs. While I am not sure whether cellranger will accept a fasta file containing 50 million contigs that are ~70 BP long, I also need to generate a GTF file for input to mkref, as cellranger is a splicing-aware algorithm and will assign reads to intergenic, exonic, or intronic regions. While there are reference GTF files for most transgenes, including RFP, there are no such references for barcodes. The barcode containing transcripts are unspliced (e.g. fully exonic) so conceptually such a GTF should not be difficult to generate.

Practically speaking, what is the best way to generate this GTF file? I could make a BED file with start and end position of each contig (all should be the same size, only the name of the contig will be different), but are there tools (i.e. Bioconductor or elsewhere) to then convert this into GTF format, and specifying each one of these contigs as fully exonic? Thank you in advance for any help!

Cellranger • 755 views
ADD COMMENT
3
Entering edit mode

Practically speaking I strongly suggest a different approach. You can add the rfp as a single transcript to the GTF (the tutorial uses GFP so it is straightforward to follow). But for the barcode cassette I would recommend using a custom script to search the cellranger-output .bam file for reads containing the linker sequence, and pulling out the upstream 14bp and downstream 30bp for matching against the set of valid barcodes, and using the 'UB' tag to associate the linage tracing barcode with the cell barcode.

ADD REPLY
0
Entering edit mode

Thank you for your suggestion, this seems like a good approach. We have found a better tool that can be used with paired end reads carrying similar short tags called CITE-seq-count (link below). We will explore this alternative approach if the CITE-seq-count tool does not work for our purposes

https://hoohm.github.io/CITE-seq-Count/

ADD REPLY
1
Entering edit mode

This may not be immediately helpful, but be aware that CellRanger uses STAR internally, so any questions you have along the lines of "Will CellRanger do X" should first be rephrased as "Can STAR do X", then use that information to contact 10X and ask them how to get CellRanger to do X if STAR can already do it.

Look into GTF restrictions/requirements that STAR has, then use a simple script in the language/platform of your choice to create the GTF data.

ADD REPLY

Login before adding your answer.

Traffic: 1413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6