Question

something strange when use UMI-tools dedup&count after STAR alignment

0

Entering edit mode

4.1 years ago

zhenzi7 • 0

I'm using UMI-tools to get count matrix from scRNA-seq data. I used STAR to map reads to the ref, and then put the sorted bam into UMI-tools dedup, and I got such error. Here are my command and error:

umi_tools dedup -I test_B10-1.Aligned.sortedByCoord.out.bam --paired -S test_B10-1.deduplicated.bam

2020-03-04 01:41:21,485 WARNING Chimeric read pairs are being used. Some read pair UMIs may be grouped/deduplicated using just the mapping coordinates from read1.This may also increase the run time and memory usage. Consider --chimeric-pairs==discard to discard these reads or --chimeric-pairs==output (group command only) to output them without grouping 2020-03-04 01:41:21,485 WARNING Unpaired read pairs are being used. Some read pair UMIs may be grouped/deduplicated using just the mapping coordinates from read1.This may also increase the run time and memory usage. Consider --unpared-reads==discard to discard these reads or --unpared-reads==output (group command only) to output them without grouping 2020-03-04 01:41:21,485 INFO command: dedup -I test_B10-1.Aligned.sortedByCoord.out.bam --paired -S test_B10-1.deduplicated.bam Traceback (most recent call last): File "/public/home/syli/software/miniconda3/bin/umi_tools", line 11, in <module> sys.exit(main()) File "/public/home/syli/software/miniconda3/lib/python3.6/site-packages/umi_tools/umi_tools.py", line 61, in main module.main(sys.argv) File "/public/home/syli/software/miniconda3/lib/python3.6/site-packages/umi_tools/dedup.py", line 262, in main for bundle, key, status in bundle_iterator(inreads): File "/public/home/syli/software/miniconda3/lib/python3.6/site-packages/umi_tools/sam_methods.py", line 375, in __call__ read.reference_name != read.next_reference_name): File "pysam/libcalignedsegment.pyx", line 965, in pysam.libcalignedsegment.AlignedSegment.next_reference_name.__get__ (pysam/libcalignedsegment.c:12545) File "pysam/libcalignmentfile.pyx", line 1609, in pysam.libcalignmentfile.AlignmentFile.getrname File "pysam/libcalignmentfile.pyx", line 672, in pysam.libcalignmentfile.AlignmentFile.get_reference_name

ValueError: reference_id -1 out of range 0<=tid<359

I've tried use Bowtie2 to map the same fastq files to the same ref,and then the bam file went into the UMI-tools dedup, it worked, however i perfer to STAR.

bowtie2 map

 bowtie2 -q --phred33 --very-fast --end-to-end -p 8 -x genome_ref -1 B10-1.bbmap.1.fastq.gz -2 B10-1.bbmap.2.fastq.gz | samtools view -@ 8 -Sb - > B10-1.b73.fast.bam

STAR map

 ~/software/STAR-2.6.1b/bin/Linux_x86_64/STAR --runThreadN 10 --genomeDir ~/genome/ --readFilesIn B10-1.bbmap.1.fastq.gz B10-1.bbmap.2.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outBAMsortingThreadN 10 --outFilterMultimapNmax 1 --outFileNamePrefix test_B10-1.

i've checked the reference name in the bam file, all are contained in the @SQ. I appreciate for any suggestion!

This is my first question on Biostars, i was confused about operation on submit the code and error, look my ugly question, can somebody help me?

RNA-Seq • 2.2k views

ADD COMMENT • link 4.1 years ago by zhenzi7 • 0

0

Entering edit mode

Did you try --chimeric-reads==discard?

I assume you checked to make sure that your two fastqs have the same number of reads?

ADD REPLY • link 4.1 years ago by swbarnes2 14k

0

Entering edit mode

Thanks for the quick answer! STAR chimeric info output into Chineric.out.junction not into the main aligned BAM files by default, set by --chimOutType, and i cannot find --chimeric-reads==discard in STAR-2.6.1b for the number of reads, i check my fastq file 29764216 B10-1.bbmap.1.fastq.gz 64723195 B10-1.bbmap.2.fastq.gz I got these two files though the code below:

umi_tools extract --extract-method=regex --bc-pattern='(?P<discard_1>AAGCAGTGGTATCAACGCAGAGTGAAT){s<=1}(?P<umi_1>.{10}).*' --stdin QH016NA-B10-1_1_paired.fastq.gz --stdout B10-1.extract.1.fastq.gz --read2-in QH016NA-B10-1_2_paired.fastq.gz --read2-out B10-1.extract.2.fastq.gz -L extract.log
reformat.sh in1=B10-1.extract.1.fastq.gz in2=B10-1.extract.2.fastq.gz out1=B10-1.bbmap.1.fastq.gz out2=B10-1.bbmap.2.fastq.gz minlen=25

ADD REPLY • link 4.1 years ago by zhenzi7 • 0

0

Entering edit mode

It's umi_tools telling you to address chimeras, not STAR.

Something is really wrong if your two fastqs have different number of reads.

ADD REPLY • link 4.1 years ago by swbarnes2 14k

0

Entering edit mode

checked files size after every step, before and after trim, read1.fastq.gz and read2.fastq.gz have the same number of reads，and the almost equal size of files, and after umitools extract, although the file sizes have such a big difference, they contain the same number of reads( i confirmed this by calculating the number of lines in each fastq file), the same is true after BBmap. So before mapping, read1.fastq.gz and read2.fastq.gz have the same number of reads…… And I've tried --chimeric-reads==discard, it still reports the same error.

ValueError: reference_id -1 out of range 0<=tid<359

Is there something else i can try?

ADD REPLY • link 4.1 years ago by zhenzi7 • 0