When saving protected Word-documents as html-files, Word adds a "checksum" of the password (enclosed in a proprietary tag) to the code. The checksum format looks somewhat like CRC32 but currently there are no further details available. The same checksum can be found within the original Word document (hexadecimal view). If this "checksum" is replaced by 0x00000000, the password equals an empty string.
Example:
1.) Open a protected document in MS Word
2.) Save as "Web Page (.htm; .html)", close Word
3.) Open html-document in any Text-Editor
4.) Search "" tag, the line reads something like that: ABCDEF01
5.) Keep the "password" in mind
6.) Open original document (.doc) with any hex-editor
7.) Search for hex-values of the password (reverse order!)
8.) Overwrite all 4 double-bytes with 0x00, Save, Close
9.) Open document with MS Word, Select "Tools / Unprotect Document" (password is blank)
Variation:
If the 8 checksum bytes are replaced with the checksum of a known password it should be fairly easy to unprotect the document, make any necessary changes, save, close and reset the password to the original (unknown!) password by simply restoring the original values. Document changed without even knowing the password.
Best of luck.
Sundaram