Question

Chromosome disagreement in drop-seq meta data pipeline

0

Entering edit mode

5.3 years ago

chilifan ▴ 120

I am generating meta data and genome index for h. sapiens through the drop-seq 2.0.0 pipeline called create_Drop-seq_reference_metadata.sh. I have tried using fasta reference and GTF files from both ensamble! and gencode (https://www.gencodegenes.org/human/ using the Genome sequence (GRCh38.p12) fasta in combination with the Comprehensive gene annotation GTF CHR). In the output from the pipeline there is quite a lot of "chromosome disagreements, skipping gene" for both ensemble and gencode files.

Is there supposed to be chromosome disagreement? Or, how much disagreement is ok? What happens to the "skipped" genes, are they not included in the meta files? I have used the largest reference genomes and reference genomes with only the chromosomes so far, is one to prefer over the other?

And, last question: The pipeline goes on until the Star indexing is finished successfully, and then the pipeline breaks because permission is denied for bgzip. I tried downloading a new bgzip but it doesn't work (also tried with sudo, desn't work). However, at this point output files with sizes ranging from 4 kb - 1GB and 30 mb - 3.3 GB are created for both the genome index and the meta data respectively. But of course, I can't be sure that they are correct. Am I missing anything important when the pipeline breaks here? Do I need to fix bgzip?

RNA-Seq drop-seq meta data genome indexing • 1.0k views

ADD COMMENT • link 5.3 years ago by chilifan ▴ 120