Home > .NET, ColdFusion > ColdFusion 9: Search and Replace Text in a .DOCX files with OpenXML SDK 2.0

ColdFusion 9: Search and Replace Text in a .DOCX files with OpenXML SDK 2.0

I saw a recent question on one of the ColdFusion forums about using the .NET OpenXML SDK 2.0 to search and replace text within a *.docx file.  I have only used that SDK tangentially. But since it is a .NET assembly, in theory it can be used from ColdFusion.

While the OpenXML specifications are pretty complex, the example for searching and replacing text seemed straightforward enough.  Just your basic regular expression replace.  So I decided to give it whirl to see if the theory held up.  It did. So I decided to post the example here in case it helps someone else.

Pre-Requisites:

The SDK requires the .NET Framework 3.5 SP1. Having already installed that, I went ahead and installed the OpenXML SDK 2.0. Then set about converting the .NET example. It was mostly smooth sailing with only a few changes required.

Differences:

First, the C# code uses a construct called a using  statement. It is basically just another way to write a try/finally statement to ensure proper disposal of the created objects.  Since CF does not support that construct, I switched those to a try/finally clause instead.

Second, some of the classes like StreamReader are part of the core assembly. Meaning you can instantiate them in CF without specifying an assembly path. However, when I tried mix those objects with those from the OpenXML assembly ColdFusion threw a weird instantiation error

Unable to find a constructor for class System.IO.StreamReader that accepts parameters of type ( MS.Internal.IO.Zip.ZipIOModeEnforcingStream ).

Passing in my assembly list to all of the createObject() calls seemed to resolve the error. Note, the SDK also seems to use classes from WindowsBase.dll so it is included in my assembly list as well.

Code  (Warning, overwrites the existing file!)

<cfscript>
sourceFile = "c:/test/MyFile.docx";
assembly = "C:/Program Files/Open XML SDK/V2.0/lib/DocumentFormat.OpenXml.dll,C:/Program Files/Reference Assemblies/Microsoft/Framework/v3.0/WindowsBase.dll";
WordprocessingDocument = createObject(".net", "DocumentFormat.OpenXml.Packaging.WordprocessingDocument", assembly);

try {
    // read in the document and extract the text
    wordDoc = WordprocessingDocument.Open(sourceFile, javacast("boolean", true));
    mainPart = wordDoc.Get_MainDocumentPart();
    StreamReader = createObject(".net", "System.IO.StreamReader", assembly);
    reader = StreamReader.init( mainPart.GetStream() );
    docText = reader.ReadToEnd();

    // use a REGEX to replace the text
    Regex = createObject(".net", "System.Text.RegularExpressions.Regex");
    regex = Regex.init("my regular expression");
    docText = regex.Replace(docText, "replace with this text");

    // save file to disk
    FileMode = createObject(".net", "System.IO.FileMode", assembly);
    StreamWriter = createObject(".net", "System.IO.StreamWriter", assembly);
    writer = StreamWriter.init( mainPart.GetStream(FileMode.Create) );
    writer.Write(docText);
    writer.Flush();
    writer.close();
}
finally {
    // always clean up objects
    if (structKeyExists(variables, "wordDoc")) {
         wordDoc.Dispose();
    }
    if (structKeyExists(variables, "reader")) {
        reader.Dispose();
    }
    if (structKeyExists(variables, "writer")) {
        writer.Dispose();
    }
    WriteDump("DONE");
}
</cfscript>
Advertisements
Categories: .NET, ColdFusion
  1. February 12, 2012 at 4:37 pm

    Thanks very much for this post.

    You are probably one of the very few who has ever tried to do this. I wonder how hard it would be to do more advanced stuff like working with checkboxes, tables, headers/footers, etc? I tried figuring it out, but I do not know C#/VB.Net so it makes it kind of difficult.

    ColdFusion X should try and make a wrapper around this SDK… that would be one awesome feature. Probably a good custom tag to develop…

    • February 12, 2012 at 6:25 pm

      I am guessing it is possible. But I do not know how involved it is. I would suggest starting with the existing C# examples in the documentation. I see at least one for replacing headers (and probably others you could adapt using this entry as base.)

      Obviously some familiarity with C# helps. But as you start working with it you will find the overall syntax is very similar to cfscript. Making it easier to understand what is going on in the code than with say php.

      Whatever tool you use, you will have to learn its API (java, CF, .net..). However, I think the bigger challenge will be learning the openxml structure, which is extremely complicated IMO. Hence the popularity of existing wrappers like Aspose, ecterera 😉

  2. February 13, 2012 at 10:43 am

    After looking at trying to use CFZip/CFFile to extract and read/write to the XML… it looks like this OOXML schema is very complicated (kept getting XML errors when trying to read/write to placeholders).

    The Find/Replace with the OOXML SDK in your example works though (even though the syntax confuses me a little… I guess dispose(), flush(), etc are all methods in the various classes that makeup the System namespace in .Net).

    I found this link with regards to the checkbox class: http://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.checkbox.aspx which doesn’t really show much. What I’m doing is trying to correspond checkboxes on an HTML form to checking checkboxs in a word template (around 20 of them). Using the CFZip/CFFile method doing that would probably be a total fiasco. Any ideas?

    And yeah… I’m starting to see why Aspose.Words costs so much… it’s probably well worth it in heavy OOXML manipulation. I was looking at their ColdFusion page on word manipulation and it looked like a breeze.

  3. February 13, 2012 at 1:02 pm

    > I guess dispose(), flush(), etc are all methods in the various classes
    > that makeup the System namespace in .Net

    Yep, just do a search on StreamWriter.Dispose or StreamWriter.Close which should turn up the documentation for those methods. I added flush() and close() out of habit … but it is possible dispose() does that already, making them unnecessary anyway. I would have to double check the docs.

    > Using the CFZip/CFFile method doing that would probably be a total fiasco.

    Well even that should work – if you understand the schema and are replacing the elements properly. The openxml specifications are wordy, but it is the best reference. To better understand it and troubleshoot the issue, start by creating a very simple document with a single unchecked checkbox. Create a copy but with a checked control. Then unzip the file and compare the xml.

    I wish I could say there was shortcut to learning all this … but there really is not.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: