Question

.raw files too big?

0

Entering edit mode

8.1 years ago

Emma ▴ 10

I am trying to use msconvert to convert .raw files to mzML. The raw files are ~1.5 gb and seem to be too big, in the msconvert only goes so far through the the file and stops - my output mzML are not complete. I found this out when I tried to use Tandem and got 'syntax error parsing XML' back.

Is there a way to split the raw file before, or during, msconvert to mzml so I get multiple smaller output files that will work?

thank you

msconvert proteomics • 2.4k views

ADD COMMENT • link updated 8.1 years ago by dariober 14k • written 8.1 years ago by Emma ▴ 10

score 0 · Answer 1 · 2016-04-05

0

Entering edit mode

8.1 years ago

dariober 14k

msconvert has a --filter option, maybe you could try to extract a set of scan indexes at a time and then merge them together at the end? From the example here

msconvert data.RAW --filter "index [5,10] [20,25]"

ADD COMMENT • link 8.1 years ago by dariober 14k

0

Entering edit mode

I thought about the filtering options (and tried a bit) but wasn't sure because I want to keep all my data. Are you suggesting I run it a few times each time filtering for a subset of the data and specifying different output files for each?

ADD REPLY • link 8.1 years ago by Emma ▴ 10

0

Entering edit mode

Yes, this is what I was thinking. You should then be able to concatenate the individual files. I'm not sure how easy/feasible this is as I don't have much experience with .raw data.

ADD REPLY • link 8.1 years ago by dariober 14k

1

Entering edit mode

If the concatenated file gets too large, you can search the "decomposed" files separately, as they should be schematically valid mzML files and X!Tandem scoring doesn't care about other spectra. You can then merge the results (the smaller XML/pepXML/mzIdentML files) before proceeding with statistical validation etc. At least as long as you only care about peptide IDs and do spectral counting.

I have the same problem with ~2 Gb/80,000 spectra .raw files. With the latest msconvert, I get the complete mzML file without error or warning messages, but X!Tandem and COMET still cannot use the file... I will post a solution/workaround here if I find one.

ADD REPLY • link 8.1 years ago by magnus.palmblad ▴ 10

0

Entering edit mode

Thank you. This is what I ended up doing and my final protein list looks as I expected. I had a problem concatenating as each smaller file had a beginning and end section - opening the files in a text editor and removing these blocks of text and putting the right bits at the end of the file was going to be a lot of work and error prone.

ADD REPLY • link 8.1 years ago by Emma ▴ 10