"Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in these materials. Except as expressly provided in the Microsoft Open Specification Promise and this notice, the furnishing of these materials does not give you any license to these patents, trademarks, copyrights, or other intellectual property."
It's better than nothing but it's still a dangerous format to use.
If you look at the Microsoft Open Specification Promise it says:
"Microsoft irrevocably promises not to assert any Microsoft Necessary Claims against you for making, using, selling, offering for sale, importing or distributing any implementation to the extent it conforms to a Covered Specification (“Covered Implementation”), subject to the following..."
[Doesn't cover non-MSoft patents. Not surprising.]
Office Binary formats are under Covered Specifications.
I wouldn't use the format, but not sure it's "dangerous" to write apps that read or generate the format.
Awesome. I'm a junior trader at a major investment bank and I have to deal with Excel all the time... it would be great to be able to programmatically generate well-formed Excel files without having to deal with VBA, COM automation, or anything else. Now someone just has to write a nice Haskell library...
VB.NET (which now has static metaprogramming and closures) + COM automation isn't too bad in my experience. You could also use Python for COM.
(I've found that when you set the application to be visible and each command is carried out visually, it's very impressive to non-programmers, especially ones who don't use macros.)
Other than that, http://poi.apache.org/ and a marginal language that targets the JVM is probably your best bet for now.
Poi has worked well for our projects where we have to export customer data to XLS (a commonly-requested feature that is almost like a checklist item)
I find the file system within a file (OLE2 compound document) fascinating. I wonder who at Microsoft came up with that idea (or was it really an idea by technical committee)
It's a wild guess but I would have used COM access to the word.dll to convert it to a more reasonable format.
Again, a wild guess but that is how I would have done it, trying to reverse engineer formats as bloated as the office formats is generally not a good idea if avoidable.
"Antiword is a free MS Word reader for Linux and RISC OS. There are ports to FreeBSD, BeOS, OS/2, Mac OS X, Amiga, VMS, NetWare, Plan9, EPOC, Zaurus PDA, MorphOS, Tru64/OSF and DOS. Antiword converts the binary files from Word 2, 6, 7, 97, 2000, 2002 and 2003 to plain text and to PostScript."
It's better than nothing but it's still a dangerous format to use.