Question: Reference Transcriptome for Drosophila Melanogaster[orgn] with cellranger mkref
0
gravatar for el24
9 months ago by
el2410
USA
el2410 wrote:

Hi all, I am new to bioinformatics, so I was wondering if someone can help me with some issues I have with cellranger. I'm trying to run cellranger count on Drosophila melanogaster data, but I need a transcriptome reference to run it. I use this link to create the transcriptome reference file using genome sequence (FASTA) and gene annotations (GTF). Based on that, in Ensembl, the recommended genome file to download is annotated as "primary assembly." In NCBI, it is "no alternative - analysis set." I couldn't find either of the titles on Ensemble or NCBI. I used a couple of different files (GTF and FASTA) on Flybase or NCBI, but I couldn't create a reference transcriptome using them as I got errors. Then, I tried below files, to create the reference:

ftp://ftp.ensemblgenomes.org/pub/metazoa/release46/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.28.dna.toplevel.fa.gz ftp://ftp.ensembl.org/pub/release-77/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.77.gtf.gz

I managed to create the reference file, but when I run cellranger count using this reference transcriptome, I get an error for different replicates. To be more specific, the error is "Low Fraction Reads Confidently Mapped To Transcriptome" that says I got "19.0%, but Ideal > 30%. This can indicate the use of the wrong reference transcriptome, a reference transcriptome with overlapping genes, poor library quality, poor sequencing quality, or reads shorter than the recommended minimum. Application performance may be affected."

Could you please tell me where I can find a reference transcriptome or where I can find a better GTF and FASTA files to create the reference myself? I appreciate your response, thanks!

software error gene • 636 views
ADD COMMENTlink modified 9 months ago by benformatics2.0k • written 9 months ago by el2410
2
gravatar for benformatics
9 months ago by
benformatics2.0k
ETH Zurich
benformatics2.0k wrote:

You downloaded the dm3 GTF and used the dm6 genome (i.e. FASTA).

Please use the most current version (as of Feb 2020) and make sure you are using annotations than match your genome.

FASTA:

ftp://ftp.ensembl.org/pub/release-99/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.28.dna.toplevel.fa.gz

GTF:

ftp://ftp.ensembl.org/pub/release-99/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.28.99.chr.gtf.gz

ADD COMMENTlink modified 9 months ago • written 9 months ago by benformatics2.0k
1

If it wasn't clear from the post.

Because your GTF file is in the old Drosophila genome (dm3) coordinate system and your .fasta file was the sequence for the newest Drosophila genome (dm6) - a huge number of genes' coordinates will be incorrect for your reference and are thus the most likely reason for your low fraction of mapped reads.

ADD REPLYlink written 9 months ago by benformatics2.0k

It was very clear, thank you very much for explaining the solution!

ADD REPLYlink modified 9 months ago • written 9 months ago by el2410

Thank you for your help! I got a warning after running cellranger count on two replicates (the third one worked just fine) using the files that you have mentioned. My warning says *"Low Fraction Reads in Cells which is because I got a 61.3%, but Ideal > 70%. Application performance may be affected. Many of the reads were not assigned to cell-associated barcodes. This could be caused by high levels of ambient RNA or by a significant population of cells with a low RNA content, which the algorithm did not call as cells. The latter case can be addressed by inspecting the data to determine the appropriate cell count and using --force-cells."*

Do you think it's a good idea to use --force-cells? I would really appreciate it if you have any recommendations to fix this.

ADD REPLYlink modified 9 months ago • written 9 months ago by el2410
1

Ideal is 100% but I frankly don't have much experience with 10X sequencing specifically. For other scRNA-seq technologies we see a huge variation in alignment %s. I would say if you are working with patient samples, especially in the case of disease, that the cell quality is often much lower. I personally would move forward with alignment rates over 50-60%. However, it would be wise to go in and make sure that there are good correlations between all the replicates. On the other hand if you are using something like cell lines... then this does seem a bit low.

If I was in your position, I would compare the results using "--force-cells" to the results without using it to see if I really believe in the added cells.

ADD REPLYlink written 8 months ago by benformatics2.0k
2

I would say if you are working with patient samples, especially in the case of disease

Since original question is about flies we can safely eliminate that possibility :-)

ADD REPLYlink written 8 months ago by genomax91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1069 users visited in the last hour