Archive

Archive for February, 2012

ColdFusion 9: Search and Replace Text in a .DOCX files with OpenXML SDK 2.0

February 12, 2012 4 comments

I saw a recent question on one of the ColdFusion forums about using the .NET OpenXML SDK 2.0 to search and replace text within a *.docx file.  I have only used that SDK tangentially. But since it is a .NET assembly, in theory it can be used from ColdFusion.

While the OpenXML specifications are pretty complex, the example for searching and replacing text seemed straightforward enough.  Just your basic regular expression replace.  So I decided to give it whirl to see if the theory held up.  It did. So I decided to post the example here in case it helps someone else.

Pre-Requisites:

The SDK requires the .NET Framework 3.5 SP1. Having already installed that, I went ahead and installed the OpenXML SDK 2.0. Then set about converting the .NET example. It was mostly smooth sailing with only a few changes required.

Differences:

First, the C# code uses a construct called a using  statement. It is basically just another way to write a try/finally statement to ensure proper disposal of the created objects.  Since CF does not support that construct, I switched those to a try/finally clause instead.

Second, some of the classes like StreamReader are part of the core assembly. Meaning you can instantiate them in CF without specifying an assembly path. However, when I tried mix those objects with those from the OpenXML assembly ColdFusion threw a weird instantiation error

Unable to find a constructor for class System.IO.StreamReader that accepts parameters of type ( MS.Internal.IO.Zip.ZipIOModeEnforcingStream ).

Passing in my assembly list to all of the createObject() calls seemed to resolve the error. Note, the SDK also seems to use classes from WindowsBase.dll so it is included in my assembly list as well.

Code  (Warning, overwrites the existing file!)

<cfscript>
sourceFile = "c:/test/MyFile.docx";
assembly = "C:/Program Files/Open XML SDK/V2.0/lib/DocumentFormat.OpenXml.dll,C:/Program Files/Reference Assemblies/Microsoft/Framework/v3.0/WindowsBase.dll";
WordprocessingDocument = createObject(".net", "DocumentFormat.OpenXml.Packaging.WordprocessingDocument", assembly);

try {
    // read in the document and extract the text
    wordDoc = WordprocessingDocument.Open(sourceFile, javacast("boolean", true));
    mainPart = wordDoc.Get_MainDocumentPart();
    StreamReader = createObject(".net", "System.IO.StreamReader", assembly);
    reader = StreamReader.init( mainPart.GetStream() );
    docText = reader.ReadToEnd();

    // use a REGEX to replace the text
    Regex = createObject(".net", "System.Text.RegularExpressions.Regex");
    regex = Regex.init("my regular expression");
    docText = regex.Replace(docText, "replace with this text");

    // save file to disk
    FileMode = createObject(".net", "System.IO.FileMode", assembly);
    StreamWriter = createObject(".net", "System.IO.StreamWriter", assembly);
    writer = StreamWriter.init( mainPart.GetStream(FileMode.Create) );
    writer.Write(docText);
    writer.Flush();
    writer.close();
}
finally {
    // always clean up objects
    if (structKeyExists(variables, "wordDoc")) {
         wordDoc.Dispose();
    }
    if (structKeyExists(variables, "reader")) {
        reader.Dispose();
    }
    if (structKeyExists(variables, "writer")) {
        writer.Dispose();
    }
    WriteDump("DONE");
}
</cfscript>
Advertisements
Categories: .NET, ColdFusion