Question

PARalyzer error: * is not found in the .2bit file

1

Entering edit mode

4.8 years ago

xiaoleiusc ▴ 140

Dear Biostars forum users,

I tried to process my PAR-CLIP dataset with PARalyzer version 1.5 ( https://ohlerlab.mdc-berlin.de/software/PARalyzer_85/ ). While I could start running the program without any problem, after several hours I always got an error as below:

Running PARalyzer v1.5
Parsing SAM file(s)...Done
Creating Read Groups & Clusters...Exception in thread "main" java.lang.Exception: * is not found in the .2bit file
at MyTwoBitParser.loadChromosome(MyTwoBitParser.java:58)
at MyTwoBitParser.getSequence(MyTwoBitParser.java:118)
at PARalyze.main(PARalyze.java:204)

I generated by 2bit file by faToTwoBit tools (e.g. faToTwoBit hg19.fasta hg19.2bit).

I really appreciate any input to solve this issue.

Best,

Xiao

CLIP-Seq • 1.5k views

ADD COMMENT • link updated 4.8 years ago by Ram 43k • written 4.8 years ago by xiaoleiusc ▴ 140

1

Entering edit mode

In SAM/BAM files, typically * is used as chromosome name to indicate unmapped reads. Can you share the command line?

ADD REPLY • link 4.8 years ago by ATpoint 81k

0

Entering edit mode

strong textHi, ATpoint,

Thanks a lot for your reply. I am sorry for my late reply to your question (I am new here and I did not get email notification somehow for messages). I share my command line which leads to error as below:

(base) bieniaszs-ipro:hg19_NL4_3 bieniaszlab$ PARAlyzer 128G hnRNPU_NL43.ini
Running PARalyzer v1.5
Parsing SAM file(s)...Done
Creating Read Groups & Clusters...Exception in thread "main" java.lang.Exception: * is not found in the .2bit file
at MyTwoBitParser.loadChromosome(MyTwoBitParser.java:58)
at MyTwoBitParser.getSequence(MyTwoBitParser.java:118)
at PARalyze.main(PARalyze.java:204)

My 2bit file was generated by Fatotwobit tool of human hg19, I did not use filter file and my .ini file is as below:

BANDWIDTH=3
CONVERSION=T>C
MINIMUM_READ_COUNT_PER_GROUP=5
MINIMUM_READ_COUNT_PER_CLUSTER=5
MINIMUM_READ_COUNT_FOR_KDE=5
MINIMUM_CLUSTER_SIZE=11
MINIMUM_CONVERSION_LOCATIONS_FOR_CLUSTER=1
MINIMUM_CONVERSION_COUNT_FOR_CLUSTER=1
MINIMUM_READ_COUNT_FOR_CLUSTER_INCLUSION=5
MINIMUM_READ_LENGTH=1
MAXIMUM_NUMBER_OF_NON_CONVERSION_MISMATCHES=0

EXTEND_BY_READ

#ADDITIONAL_NUCLEOTIDES_BEYOND_SIGNAL=20

SAM_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/hnRNPU_NL43.sort.sam=COLLAPSED
GENOME_2BIT_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/NL43.2bit

OUTPUT_DISTRIBUTIONS_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/hnRNPU_NL43_distribution.csv
OUTPUT_GROUPS_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/hnRNPU_NL43_groups.csv
OUTPUT_CLUSTERS_FILE=/Users/bieniaszlab/miniconda3/data/hnRNP_U/hg19_NL4_3/hnRNPU_NL43_clusters.csv

Regards,
Xiao

ADD REPLY • link updated 3.3 years ago by Ram 43k • written 4.8 years ago by xiaoleiusc ▴ 140

0

Entering edit mode

Hi, ATpoint,

I really appreciate your stimulating input! I found that I have to use sam files with only mapped reads as input in the Paralyzer to make it work. I need to do samtools view -b -F 4 input.bam > output_mapped.bam to generate bam files with only mapped reads and then convert the bam to sam file by samtools view -h output_mapped.bam > output_mapped.sam to generate sam file that is working with Paralyzer. As you mentioned that in SAM/BAM files, typically * is used as chromosome name to indicate unmapped reads. This is likely giving me a problem in my Paralyzer run. My Paralyzer runs well with sam files with only mapped reads!

ADD REPLY • link 4.1 years ago by xiaoleiusc ▴ 140

0

Entering edit mode

I had the same issue and this solved it. Even though I ran Bowtie with --no-unal, I guess there were still some unmapped reads in there.

ADD REPLY • link 4.0 years ago by chrishuges • 0

0

Entering edit mode

Did you get the same error with the pre-built filter files provided on this tool's webiste?

We also provide the filter files for human assembly hg19 here and mouse assembly mm9 here.

ADD REPLY • link 4.8 years ago by Sej Modha 5.3k

0

Entering edit mode

Hi, Sej,

I found that I have to use sam files with only mapped reads as input in the Paralyzer to make it work. I need to do

samtools view -b -F 4 input.bam > output_mapped.bam

to generate bam files with only mapped reads and then convert the bam to sam file by

samtools view -h output_mapped.bam > output_mapped.sam

to generate sam file that is working with Paralyzer.

Thanks for your input.

Xiao

ADD REPLY • link updated 3.3 years ago by Ram 43k • written 4.8 years ago by xiaoleiusc ▴ 140

0

Entering edit mode

Hi, Xiao

When I ran PARalyzer, it didn't work with "=COLLAPSED" at the end of SAM_FILE lines but only work without "=COLLAPSED". I collapsed the fastq files with both fastx_toolkit and CIMS/fastq2collapse.pl but both didn't work.

Did you have any experience about this problem ?

Best, Seokju

ADD REPLY • link 3.4 years ago by blastfulnace • 0

0

Entering edit mode

Hi, Seokju,

Sorry for the late reply but I just saw your message today. I did not include =COLLAPSED in my PARalyzer ini file. I use fastx_toolkit to collapse reads. Here is one of my ini file for PARalyzer below:

BANDWIDTH=3
CONVERSION=T>C
MINIMUM_READ_COUNT_PER_GROUP=10
MINIMUM_READ_COUNT_PER_CLUSTER=5
MINIMUM_READ_COUNT_FOR_KDE=5
MINIMUM_CLUSTER_SIZE=15
MINIMUM_CONVERSION_LOCATIONS_FOR_CLUSTER=2
MINIMUM_CONVERSION_COUNT_FOR_CLUSTER=2
MINIMUM_READ_COUNT_FOR_CLUSTER_INCLUSION=5
MINIMUM_READ_LENGTH=1
MAXIMUM_NUMBER_OF_NON_CONVERSION_MISMATCHES=1

EXTEND_BY_READ

GENOME_2BIT_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/Index/hg19/hg19.2bit
SAM_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/CLIP17_IRCLIP4/Nextseq_Midoutput/CLIP17_synNCp15_NC43FSFS_hg19_m2.sorted.sam
OUTPUT_DISTRIBUTIONS_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/CLIP17_IRCLIP4/CLIP17_synNCp15_NC43FSFS_hg19_m2_distri.csv
OUTPUT_GROUPS_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/CLIP17_IRCLIP4/Nextseq_Midoutput/CLIP17_synNCp15_NC43FSFS_hg19_m2_group.csv
OUTPUT_CLUSTERS_FILE=/Users/bieniaszlab/Documents/Xiao/CLIP/CLIP17_IRCLIP4/Nextseq_Midoutput/CLIP17_synNCp15_NC43FSFS_hg19_m2_clusters.csv

ADD REPLY • link updated 3.3 years ago by Ram 43k • written 3.3 years ago by xiaoleiusc ▴ 140