Saturday, November 28, 2009

Word 2007 Document Processing Using OpenXML

One interesting topic is how to handle Word documents using code. I did a test to export the page content from a page in publishing site's Pages library to a Word 2007 document, and save it to a separate document library with success.

Code:
   /// <summary>
/// Export publishing page's content to Word 2007 document controls
/// Exported documents stored in a separate document library
/// </summary>
/// <param name="sourceItem">A list item from Pages' library</param>
/// <param name="targetList">A document library saves exported Word 2007 documents</param>
public static void ExportPubPageContentToWordDoc(SPListItem sourceItem, SPList targetList)
{
SPDocumentLibrary lib = targetList as SPDocumentLibrary;
if (lib == null)
{
throw new Exception("Target list is not a Document Library type");
}

foreach (SPContentType ctype in lib.ContentTypes)
{
if (ctype.Name.ToLower() != "document" && ctype.Name.ToLower() != "folder")
{
SPFile tempFile = ctype.ResourceFolder.Files[ctype.DocumentTemplate];
using (Stream fileStream = tempFile.OpenBinaryStream())
{
BinaryReader reader = new BinaryReader(fileStream);
MemoryStream memString = new MemoryStream();
BinaryWriter writer = new BinaryWriter(memString);
writer.Write(reader.ReadBytes((int)fileStream.Length));
writer.Flush();
reader.Close();

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(memString, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
IEnumerator<CustomXmlPart> xmlPartEnumerator = mainPart.CustomXmlParts.GetEnumerator();
xmlPartEnumerator.MoveNext();
CustomXmlPart XMLPart = xmlPartEnumerator.Current;

// Create an XML document that matches our structure
XmlDocument doc = new XmlDocument();

// Create some nodes
XmlElement rootNode = doc.CreateElement("propertydata");
XmlElement titleNode = doc.CreateElement("title");
XmlElement body = doc.CreateElement("body");

titleNode.InnerText = GetFieldValueString(sourceItem, "Title");
rootNode.AppendChild(titleNode);
doc.AppendChild(rootNode);

body.InnerText = GetFieldValueString(sourceItem, "Article Body");
rootNode.AppendChild(body);
doc.AppendChild(rootNode);

MemoryStream resultStream = new MemoryStream();
doc.Save(resultStream);
resultStream.Flush();
resultStream.Position = 0;
XMLPart.FeedData(resultStream);

string fileName = sourceItem.File.Name;
if (fileName.IndexOf('.') > 0)
fileName = fileName.Substring(0, fileName.LastIndexOf('.'));
fileName += ".docx";
string docUrl = lib.RootFolder.Url + "/" + fileName;
SPFile newDoc = lib.RootFolder.Files.Add(docUrl, memString, true);
lib.Update();
}
}
}
}
}
OpenXML SKD 2.0 (http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en) is required to run above code. Word 2007 Content Control tool-kit (http://dbe.codeplex.com/) is handy to manipulate Word 2007 documents'XML, and I used it to create the document library template file.

Good references on this topic:
http://blogs.msdn.com/mikeormond/archive/2008/06/20/word-2007-content-controls-databinding-and-schema-validation.aspx
http://www.craigmurphy.com/blog/?p=913
http://www.microsoft.com/uk/msdn/screencasts/screencast/236/Word-2007-Content-Controls-and-Schema-Validation.aspx