I am trying to use the script MuMRescueLite.py from http://genome.gsc.riken.jp/osc/english/dataresource/. That's the link for the software: http://genome.gsc.riken.jp/osc/english/software/src/MuMRescueLite_090522.tar.gz What it does more or less: After mapping, it examines the mapping locations of all reads (single and multi mapping reads) and using the information from single mapping reads, it probabilistically assigns where a multi mapping read maps The script needs a specific input of the format:
#ID locations chromosome strand start end count
read_x 1 chr1 + 100 126 100
read_y 2 chr1 + 102 128 100
read_y 2 chrX + 102 128 100
While the .bed format output of an aligner (such as bowtie,bwa) looks like:
chr6 135135525 135135550 HWI-ST897:159:C1ACKACXX:1:1101:1127:2068 255 +
chr5 140027416 140027441 HWI-ST897:159:C1ACKACXX:1:1101:1280:2087 255 +
chr16 57219907 57219932 HWI-ST897:159:C1ACKACXX:1:1101:1414:2089 255 -
I am using awk to turn the second(.bed format) in to the first format, as: awk '/./ {print $1"\t"$7 + 1"\t"$3"\t"$2"\t"$4"\t"$4 + length($5)"\t1"}3333' < <mapping output> > <MuM Input>
or awk '{$7=$7+1;end=$4+length($5);}{OFS="\t";print$1,$7,$3,$2,$4,end,"1"}'
Nevertheless MuMrescueLite.py keeps failing with the following message:
Traceback (most recent call last):
File "../../tools/MuMRescueLite_090522/MuMRescueLite.py", line 194, in <module>
remainSingleMapper(inFile, singleMappers, inputHeaderFlag)
File "../../tools/MuMRescueLite_090522/MuMRescueLite.py", line 55, in remainSingleMapper
chromosome, strand, position = retPosition(columns)
File "../../tools/MuMRescueLite_090522/MuMRescueLite.py", line 39, in retPosition
raise Exception
Exception
I guess my issue is at the awk command as i am pretty new in using awk for file manipulations! Anyone that has previously used MumRescueLite or any awk expert that could help me out???
Thanks in advance