Page 1 of 1
Sorry Topher...
Posted: Sun Aug 26, 2007 2:20 pm
by DCrazy
Posted: Sun Aug 26, 2007 2:58 pm
by Topher
Is the chart thing a bug? Probably.
Converting from binary to XML is an incredibly hard engineering problem.
Try taking one of the articles suggestions: using localized names for XML tags and values. Say a product ships in 50 languages. So now I can have 50 different ways of storing the same tag. I now have 50*50 ways of storing two tags side by side.
- It would lead to a huge test effort
- If we ever changed the localization of something, the file becomes corrupt
- The application's install becomes huge as in order to load something we need to know about all languages
- Adding new languages means we have to update all copies of the application, otherwise you can't open documents
- Performance is killed. \"I don't know this tag, it's not English, check Spanish, check Greek, check...\"
- How is this worse than binary \"0120AF320F012FCD54...\"
Aside from that, testing is usually focused on what the user can do with the UI. Will you find more bugs by editing the XML directly? Yes.
You are free to send any bugs you find to me:
http://blogs.msdn.com/coolbeans/contact.aspx
I can't say anything will happen, but I will take a look.
Posted: Sun Aug 26, 2007 6:34 pm
by fliptw
the localization argument is a bit nitpickish; show me an html page that uses native terms for the tags.
The rest seems endemic of office file format issues going back a decade or more. I once came across this excel sheet that would crash excel every time I tried to print it, saved it in XP's XML format of the day and it printed. The error itself was simple: improper closure of a formating style, using XML ordered it correctly, but the fact it got into the original excel sheet in the first place is a serious error. I shudder to think about word documents.
Like many things MS, at some point office needs to start from scratch, and thats never going to happen unless bankruptcy is looming on the horizon.
Posted: Sun Aug 26, 2007 7:03 pm
by Jeff250
Where does the article call for localized tags? This is my understanding: the article is talking about locale-specific kinds of data in the xml, like dates. All of the data is saved according to a US English locale in the xml but then made to appear correct to its own locale in Microsoft Office by going through gauntlets of Microsoft layers instead of using methods provided by xml to allow the data to be saved localized in the xml in the first place. The problem here is that one's localized data can appear foreign (literally) in the xml, which will make it hard to read and even harder to modify, without a keen understanding of the US English locale and how Microsoft Office translates this into your own locale, including what the article describes as two decades of Microsoft Office internationalization issues.
Posted: Sun Aug 26, 2007 8:12 pm
by Topher
Also, for Excel formulas, it means the formula names are US English formula names, which you'll never see in Excel if you are using a locale version such as French or Brazilian
What's a better solution then? How does someone who saves the document in French open it on a machine that only has Brazilian installed? Should an application save the forumlas as numbers? \"SUM\" is 0, \"ADD\" is 1, etc. How would that solve the problem the article points out?
Posted: Sun Aug 26, 2007 9:18 pm
by fliptw
it stems from the misnomer that XML needs to be \"human readable\", which is true, but with an generally unspoken qualifier that the human is a programmer, or a specialist in some manner.
XML won't be easily readable to my parents, or the public in general, so the requirement for localized tags in this piece is just pretentious nitpicking.
Posted: Sun Aug 26, 2007 10:08 pm
by DCrazy
Well, I'll summarize what I consider to be valid points:
Critical flaw /
Things that should be fixed in the standard /
Invalid complaints
1) Self-exploding spreadsheets:
Horrible design, a mistake worthy of denying OOXML standards status
2) Entered versus stored values
An artifact of IEEE floating-point storage. The question I have is, why is the localized string version of the floating-point number being stored? Seems like either the sign+exp+mantissa should be stored, or preferably whatever the user entered. How the number is manipulated by the client program is an implementation detail that doesn't need to be archived in the data
3) Optimization artefacts become a feature instead of an embarrasment
Stupid design decision. But then again, so were many things in HTML that have since been weeded out
4) VML isn't XML
An obviously intentional flaw. SVG already exists and is widely-implemented (even native under KDE4 and GNOME2.6). Summed up nicely by this quote from billg himself: e have to stop putting any effort into this and make sure that Office documents very well depends on PROPRIETARY IE capabilities.
5) Open packaging parts minefield
Lack of communication and forethought that would absolutely have to be fixed before this could even be considered to be standardized. SGML got this right with fragment identifiers; blah.html#foo is conceptually no different from document.docx#partName, and XML can handle this quite nicely using ID attributes
6) International, but US English first and foremost
When you're writing C code, you don't have a separate name for malloc(). Excel functions are different in that they're not intended solely for programmers, but someone could always create a Spreadsheet Formula Markup Language of some sort to eliminate this problem.
7) Many ways to get in trouble
Are you serious? There is no reason for formatting to require different constructs in different circumstances; it's still just formatted text! Text inside a <div> is no different from text inside a <span> in (X)HTML, and why are these archaic syntaxes even present?
Windows dates
This is plain stupid. It makes no accomodations for different calendar formats. There is an ISO standard for storing dates, use it!
9) All roads lead to Office 2007
Not a bug in the standard, but another instance of Excel not treating this \"open\" standard as an open standard
10) A world of ZIP+OLE files
OLE?! Are you kidding? I understand that encryption may be outside the scope of the standard, but for Excel to lock up what are apparently open-standard files in OLE containers means that the topic of encryption must be explicitly dealt with in the standard.
11) BIFF is gone...not!
This doesn't make any sense whatsoever. Allowing such proprietary extensions runs counter to the concept of a standard.
12) Document backwards compatibility subject to neutrino radioactivity
This complaint has nothing to do with the standard
13) ECMA 376 documents just do not exist
If I handed in a proposal to a client that relied on information I witheld, by intention or by error, I would be denied. This is exactly what should happen to OOXML.
Total:
5 critical /
5 important /
3 non-issues
Posted: Sun Aug 26, 2007 10:25 pm
by Topher
I'm sorry, I can't address all of your points.
I can give you this though: backwards compatibility is a huge feature. If Office 2007 didn't support Office 2003 documents, people would look at it as if the product was crippled.
I am curious though, how would you suggest solving #1?
The problem is: you have cells with data. You have formulas which tie cells together.
The main complaint of the article seems to be that calcChain.xml isn't in the same file as the cell data. But I don't think this would work as formulas can work across sheets.
Re:
Posted: Sun Aug 26, 2007 10:46 pm
by DCrazy
Topher wrote:I'm sorry, I can't address all of your points.
Understandable.
Topher wrote:I can give you this though: backwards compatibility is a huge feature. If Office 2007 didn't support Office 2003 documents, people would look at it as if the product was crippled.
There's no reason that backwards-compatible features be kept in the binary format, and creating a new, completely different XML format going forward. The Access team doesn't seem to have a problem doing this every two years, and the Mac BU certainly didn't determine that VBA support was critical enough to attempt to implement it on x86.
Topher wrote:I am curious though, how would you suggest solving #1?
The problem is: you have cells with data. You have formulas which tie cells together.
The main complaint of the article seems to be that calcChain.xml isn't in the same file as the cell data. But I don't think this would work as formulas can work across sheets.
No, you have cells with data and cells with formula. This is sufficient data to construct any dependency graph, or determine that such a graph is inconstructible. I'm sure you remember data normalization. Storing the processing tree is an optimization that is ambiguous (it is possible to create isomeric dependency graphs) and prone to breakage. There is no reason to store the calc chain at all, just resolve references at runtime. Optimize the client, not the persistent storage.
I think OOXML should not be granted standards status. I have nothing against an MS-created standard; I have everyting against an MS-created standard that is not worth implementing by anyone but MS. I firmly believe that OOXML exists only for the same reason that I believe Apple went through the trouble of getting Leopard UNIX-certified: meeting the letter requirements of lucrative contracts that demand standards compliance.
Posted: Sun Aug 26, 2007 11:07 pm
by Topher
Well I'll give you my (personal) perspective:
Compare reading data from an XLSX file and XLS file. It's many times easier to generate your own graphs and tables based on the spreadsheet data now. XML is easy to read and documented. It's much easier to write your data to Excel now as well. All your doing is writing XML in a ZIP file.
The scenario listed in the article is very specific: take an existing document with formulas and modify the formula.
Is this harder than reading or writing just the raw data? Yes.
Is it as common as reading and writing just raw data? Is it made easier or harder than editing the same thing in the binary format?
You have to remember: people who are editing the XML directly are already \"power users\". It's not going to be as easy as doing it through the UI.
I'm not sure of a better way to do this when data in the XML has dependencies like it can in XLSX
Re:
Posted: Mon Aug 27, 2007 1:33 am
by DCrazy
Topher wrote:Compare reading data from an XLSX file and XLS file. It's many times easier to generate your own graphs and tables based on the spreadsheet data now. XML is easy to read and documented.
XML is easy to read and documented, OOXML is not. That's the whole point of the article.
It's much easier to write your data to Excel now as well. All your doing is writing XML in a ZIP file.
Except, as illustrated above, when you have to create an OLE container. But that's an edge case, I hope.
You have to remember: people who are editing the XML directly are already "power users". It's not going to be as easy as doing it through the UI.
I'm not sure of a better way to do this when data in the XML has dependencies like it can in XLSX
Or maybe it's a PHP script that automatically fills in cells to generate a purchase order for a corporate intranet. Or maybe it's a Martian-locale version of NeoOffice running on a PowerMac G4. The point is that the whole point of an interchange standard is that you cannot limit the domain of the standard. Sure, nobody is probably going to dig into the XML and modify cell contents, but I might want to write a quick sed script on my Linux box to merge a blank spreadsheet with some data from a CS project. The architecture of the format is sufficiently obtuse to prohibit that without writing an entire spreadsheet engine.
Posted: Mon Aug 27, 2007 11:38 am
by Topher
I'll give you I don't know enough about how reading and writing formulas works in XLSX.
But just raw data is easy to import and export. And my point is that's a huge win over the binary format. It's easy to write a PHP script to read and write XLSX data now, it's much harder to do that with the binary format.
Re:
Posted: Mon Aug 27, 2007 3:10 pm
by fliptw
DCrazy wrote:The point is that the whole point of an interchange standard is that you cannot limit the domain of the standard.
Isn't the standard a domain limited to office products?
Remember, this standard IS an application of XML to a class of specific tasks.
Re:
Posted: Mon Aug 27, 2007 5:31 pm
by DCrazy
fliptw wrote:DCrazy wrote:The point is that the whole point of an interchange standard is that you cannot limit the domain of the standard.
Isn't the standard a domain limited to office products?
Remember, this standard IS an application of XML to a class of specific tasks.
When you make it impossible to perform semantically transformations using XSLT, then I think that's a red flag.
Posted: Mon Aug 27, 2007 9:40 pm
by fliptw
Yes that is a red flag, but lets put blame where its due, and thats not the fault of ooxml, its the fault of over 15 years of MS Office.
That being said, XSLT wouldn't be properly capable of processing Excel's formulas. Don't underestimate the degree of complexity we are dealing with.
Re:
Posted: Tue Aug 28, 2007 8:50 am
by DCrazy
fliptw wrote:Yes that is a red flag, but lets put blame where its due, and thats not the fault of ooxml, its the fault of over 15 years of MS Office.
Which is why I suggested that the new file format not be loaded with all of the old cruft. Keep that in the binary format, and engineer a clean, logical XML-based format.
fliptw wrote:That being said, XSLT wouldn't be properly capable of processing Excel's formulas. Don't underestimate the degree of complexity we are dealing with.
I wasn't talking about formulas, I was referring to markup. For example, you can't write an XSLT stylesheet that converts a custom document into an Excel spreadsheet, even if your custom document uses the same Excel function syntax. You basically have to rewrite Excel to get even minimum milage out of the brain-dead file format. But
MS keeps buying votes on standards committees for a file format that only they will ever be able to claim 100% compliance.
Re:
Posted: Tue Aug 28, 2007 10:30 am
by Topher
DCrazy wrote:Which is why I suggested that the new file format not be loaded with all of the old cruft. Keep that in the binary format, and engineer a clean, logical XML-based format.
No one would adopt it if it didn't have at least the majority of the features of binary.