Chapter 9. Accessing Data in Lists

Microsoft SharePoint lists are intended to be viewed and edited by users. In many SharePoint solutions, however, they also act as a data source supporting create, read, update, delete, and query (CRUDQ) operations. Support for direct editing by users and programmatic access for CRUDQ operations makes SharePoint lists a unique type of data source; they act like a bound data grid and a database table all at the same time.

The dual nature of lists means that careful consideration must be given to how they are used in a SharePoint solution. While lists support CRUDQ operations, they really aren’t database tables and should not be used when the power of a database is required. Furthermore, the out-of-the-box interface does not support the same level of customization as an ASP.NET grid control and shouldn’t be used in situations where the flexibility of a custom control is required. Ultimately, list-based solutions are best when they are designed so that user editing is supported by custom code for viewing, reporting, rules, and automation.

This chapter is focused on the technologies available in SharePoint to support CRUDQ operations against lists in server-side code. These technologies include object model support for lists and items, query support through the Collaborative Application Markup Language (CAML) and Language Integrated Query (LINQ), and Open XML support for the creation of documents in libraries. Client-side support for CRUDQ operations is covered in Chapter 10.

Using the Basic List Object Model

SharePoint provides a complete server-side object model for accessing lists and items. Using the objects provided, you can perform create, read, update, and delete (CRUD) operations on any list (querying is done using other techniques covered later in this chapter). The process begins by getting a reference to the list of interest in the form of a Microsoft.SharePoint.SPList object. List references can be retrieved from the Microsoft.SharePoint.SPListCollection, which is available from the Lists property of the SPWeb object, as shown in the following console application code.

using (SPSite siteCollection = new SPSite("http://intranet.wingtip.com"))
{
  using (SPWeb site = siteCollection.OpenWeb)
  {
    foreach(SPList list in site.Lists)
    {
      Console.WriteLine(list.Title);
    }
  }
}

Typically, you’ll be more interested in referencing a specific list than the SPListCollection object. In these cases, you may retrieve a list directly by its name or by using the Uniform Resource Locator (URL) that refers to the location of the list in the current site. When retrieving a list by name, you can use the TryGetList method of the SPListCollection, which returns null if the list does not exist. When retrieving the list by the URL, you can use the GetList method of the SPWeb object, which throws a System.IO.FileNotFoundException if the list does not exist. The following code shows both approaches using a console application.

using (SPSite siteCollection = new SPSite("http://intranet.wingtip.com))
{
  using (SPWeb site = siteCollection.OpenWeb)
  {
    SPList toyList = site.Lists.TryGetList("Toys");
    if(toyList != null)
      Console.WriteLine("Success!");
    else
      Console.WriteLine("List does not exist");

    try
    {
      SPList taskList = site.GetList("/Lists/Tasks");
      Console.WriteLine("Success!");
    }
    catch(FileNotFoundException)
    {
      Console.WriteLine("List does not exist");
    }
  }
}

Once you have a reference to a list, you may use it to access items. Lists maintain items in a Microsoft.SharePoint.SPListItemCollection object, which is accessible via the Items property of SPList. Items in the collection are returned as Microsoft.SharePoint.SPListItem objects. The collection can be bound directly to a grid for display, iterated over using a foreach statement, or directly accessed using either the item’s ID or index. Each individual SPListItem maintains a Hashtable of values representing the properties of the item. Example 9-1 shows snippets of code for a simple Web Part that performs CRUD operations on a specified SharePoint list.

Example 9-1. Basic CRUD operations

//Set up an SPGirdView
itemGrid = new SPGridView;
itemGrid.AutoGenerateColumns = false;
this.Controls.Add(itemGrid);
BoundField titleField = new BoundField;
titleField.DataField="Title";
titleField.HeaderText="Title";
itemGrid.Columns.Add(titleField);

//Get the list
SPList list = SPContext.Current.Web.Lists.TryGetList("Toys");
if (list == null)
    messages.Text = "List does not exist.";

//Create a new item
//NOTE: Update must be called
SPListItem newItem = list.Items.Add;
newItem["Title"] = "Created Item " + DateTime.Now.ToLongDateString;
newItem.Update;

//Read items by binding to grid
//NOTE: This loads all items, folders, and fields!
SPListItemCollection items = list.Items;
itemGrid.DataSource = items;
itemGrid.DataBind;

//Delete the first item
list.Items[0].Delete;

//Update the first item
//Note the use of a temporary SPListItem
SPListItem updateItem = list.Items[0];
updateItem["Title"] = "Updated Item " + DateTime.Now.ToLongDateString;
updateItem.Update;

Several aspects of the code in Example 9-1 are worth noting. First, when creating new items in a list, the Update method must be called to save the new item. If the Update method is not called, then the list item will be lost. Second, when updating a list item, be sure to use a temporary SPListItem object, as shown in the code, or your changes will not be saved. In addition, if the item was edited by another user before the update is saved, the operation will fail. Finally, when reading items from a list, you should be careful about accessing the Items property of the list. When the Items property is accessed, it attempts to load all items, folders, and fields, which can cause performance issues. SharePoint does provide some list throttling, as discussed in the section entitled Throttling Queries later in the chapter, to minimize performance issues, but the best way to ensure good performance is to write a query against the list and return only what you need.

Querying Lists with CAML

Querying a list for specific items that meet a certain criteria can be done using the Microsoft.SharePoint.SPQuery object. The SPQuery object exposes a Query property that accepts a CAML fragment, which defines the query to be performed. A ViewFields property defines the fields to return. The following code shows a simple query run against a list.

SPQuery query = new SPQuery;
query.Viewfields = @"<FieldRef Name='Title'/><FieldRef Name='Expires'/>";
query.Query =
@"<Where>
  <Lt>
    <FieldRef Name='Expires'/>
    <Value Type='DateTime'><Today/></Value>
  </Lt>
</Where>";
SPList list = SPContext.Current.Web.Lists.TryGetList("Announcements");
SPListItemCollections items = list.GetItems(query);

The ViewFields property accepts a CAML fragment containing a series of FieldRef elements. Each FieldRef element has a Name attribute that specifies the name of the list field to return from the query. Note that the Name attribute must contain the name of the field as it is defined in the schema.xml file for the list definition and not simply the display name of the field.

To create a query, you must construct a CAML fragment properly, defining the items to return from the list. At the highest level, the CAML fragment may contain Where, OrderBy, and GroupBy elements. Inside each of these elements, you can use additional CAML elements to specify conditions. Table 9-1 contains a complete list of CAML elements that may be used to create a query, and Example 9-2 shows the basic form of the CAML query.

Example 9-2. CAML query form

<Where>
  <Lt>,<Gt>,<Eq>,<Leq>,<Geq>,<Neq>,<BeginsWith>,<Contains>,<IsNotNull>,<IsNull>
    <FieldRef/>
    <Value>[Test Value], Today</Value>

  </Lt>,</Gt>,</Eq>,</Leq>,</Geq>,</Neq>,</BeginsWith>,</Contains>,</IsNotNull>,
    </IsNull>
  <And>,<Or>
  <Lt>,<Gt>,<Eq>,<Leq>,<Geq>,<Neq>,<BeginsWith>,<Contains>,<IsNotNull>,<IsNull>
    <FieldRef/>
    <Value>[Test Value], Today</Value>

  </Lt>,</Gt>,</Eq>,</Leq>,</Geq>,</Neq>,</BeginsWith>,</Contains>,</IsNotNull>,
    </IsNull>
  </And>,</Or>
</Where>
<OrderBy>
  <FieldRef/>
</OrderBy>
<GroupBy>
  <FieldRef/>
<GroupBy>

Table 9-1. CAML Elements for Querying

Element

Description

And

Groups multiple conditions

BeginsWith

Searches for a string at the beginning of the text field

Contains

Searches for a string within the text field

Eq

Equal to

FieldRef

A reference to a field (useful for GroupBy elements)

Geq

Greater than or equal to

GroupBy

Groups results by these fields

Gt

Greater than

IsNotNull

Is not null (not empty)

IsNull

Is null (empty)

Join

Used to query across two lists that are joined through a Lookup field

Leq

Less than or equal to

Lt

Less than

Neq

Not equal to

Now

The current date and time

Or

Boolean or operator

OrderBy

Orders the results of the query

Today

Today’s date

TodayIso

Today’s date in International Organization for Standardization (ISO) format

Where

Used to specify the Where clause of the query

Querying Joined Lists

In addition to querying single lists, the SPQuery object can be used to query across two lists that are joined by a Lookup field and surfacing projected fields. The basics of list joins, lookups, and projected fields are covered in Chapter 8. This chapter simply presents the necessary information to query these lists.

For example, consider two lists named Instructors and Modules. The Instructors list is a simple list that contains contact information for classroom instructors. The Modules list is a custom list that contains information about training modules that will be taught in the classroom. The Modules list is joined to the Instructors list via a Lookup field that shows the FullName of the instructor. In addition, the instructor’s E-mail Address is available as a projected field. In this way, an instructor may be assigned a module to teach. Using an SPQuery object and CAML, you may create a query that returns fields from both of these lists, as shown in Example 9-3.

Example 9-3. Querying joined lists

SPWeb site = SPContext.Current.Web;
SPList listInstructors = site.Lists["Instructors"];
SPList listModules = site.Lists["Modules"];

SPQuery query = new SPQuery;
query.Query = "<Where><Eq><FieldRef Name="Audience"/>" +
              "<Value Type="Text">Developer</Value></Eq></Where>";
query.Joins = "<Join Type="Inner" ListAlias="classInstructors">" +
              "<Eq><FieldRef Name="Instructor" RefType="Id" />" +
              "<FieldRef List="classInstructors" Name="Id" /></Eq></Join>";
query.ProjectedFields =
"<Field Name='Email' Type='Lookup' List='classInstructors' ShowField='Email'/>";
query.ViewFields = "<FieldRef Name="Title" /><FieldRef Name="Instructor" />" +
                   "<FieldRef Name="Email" />";

SPListItemCollection items = listModules.GetItems(query);

In Example 9-3, the Where clause is created to return training modules that are intended for a developer audience; this is similar to the simple example shown earlier. The Join property is new and defines the join between the two lists through the Lookup field. Remember that the query is being run on the Modules list, so it must be joined to the Instructors list. The ListAlias attribute defines an alias for the Instructors list that may be used in the join clause. The first FieldRef element refers to the name of the Lookup field in the Modules list and will always have a RefType equal to Id. The second FieldRef in the join clause uses the alias name for the Instructors list and will always have a Name equal to Id. The ProjectedFields property also uses the alias name for the Instructors list and refers to additional fields in the Instructors list that should be returned with the query.

Querying Multiple Lists

While the SPQuery object is good for querying a single list or joined lists, if you want to query multiple lists within a site collection simultaneously, then you can make use of the Microsoft.SharePoint.SPSiteDataQuery object. Like the SPQuery object, the SPSiteDataQuery object has Query and ViewFields properties. In addition to these fields, the SPSiteDataQuery object also has Lists and Webs properties. The Lists property is used to specify the lists within the site collection that should be included in the query. The Webs property is used to determine the scope of the query. Example 9-4 shows a query that returns events from all calendars in the current site collection where the end date is later than today.

Example 9-4. Querying multiple lists

//Creates the query
SPSiteDataQuery query = new SPSiteDataQuery;

//Builds the query
query.Query = "<Where><Gt><FieldRef Name='EndDate'/>" +
              "<Value Type='DateTime'><Today OffsetDays="-1"/></Value></Gt>
                 </Where>";

//Sets the list types to search
query.Lists = "<Lists ServerTemplate='106' />";

//Sets the Fields to include in results
query.ViewFields = "<FieldRef Name='fAllDayEvent' />" +
                   "<FieldRef Name='Title' />" +
                   "<FieldRef Name='Location' />" +
                   "<FieldRef Name='EventDate' />" +
                   "<FieldRef Name='EndDate' />";

//Sets the scope of the query
query.Webs = @"<Webs Scope='SiteCollection' />";

//Execute the query
DataTable table = SPContext.Current.Site.RootWeb.GetSiteData(query);

The Lists property in Example 9-4 is a CAML fragment that can take several forms to specify the lists to include in the query. Setting the property to <Lists ServerTemplate=[value]/> limits the query to lists of a certain server template. For example, type 106 is a calendar. Table 9-2 shows all the possible values for the ServerTemplate attribute. Setting the property to <Lists BaseType=[value]/> limits the query to lists of a certain BaseType. Table 9-3 lists the possible values for the BaseType attribute. Setting the property to <Lists Hidden=’true’/> includes hidden lists in the query. Setting the property to <Lists MaxListLimit=[value]/> limits the query to considering no more than the specified number of lists.

The Webs property is a CAML fragment that must either be <Webs Scope=’SiteCollection’/> or <Webs Scope=’Recursive’/>. SiteCollection includes all lists in the site collection while Recursive includes only those lists in the current site or subsites beneath the current site.

Table 9-2. Server Templates

Server Template

ID

Description

GenericList

100

Custom list

DocumentLibrary

101

Document library

Survey

102

Survey

Links

103

Links list

Announcements

104

Announcements list

Contacts

105

Contacts list

Events

106

Calendar

Tasks

107

Tasks list

DiscussionBoard

108

Discussion lists

PictureLibrary

109

Picture library.

DataSources

110

Data sources library

WebTemplateCatalog

111

Site template gallery

UserInformation

112

User list

WebPartCatalog

113

Web Part gallery

ListTemplateCatalog

114

List template gallery

XMLForm

115

InfoPath form library

MasterPageCatalog

116

Master Page gallery

WebPageLibrary

119

Wiki Page library

DataConnectionLibrary

130

Data connection library

WorkflowHistory

140

Workflow History list

GanttTasks

150

Project Tasks list

Meetings

200

Meetings

Agenda

201

Meeting agenda

MeetingUser

202

Meeting attendees

Decision

204

Meeting decisions

MeetingObjective

207

Meeting objectives

Posts

301

Blog posts

Comments

302

Blog comments

Categories

303

Blog categories

IssueTracking

1100

Issue tracking list

AdminTasks

1200

Central Administration tasks

Table 9-3. BaseType Values

Value

Description

0

Generic list

1

Document library

3

Discussion forum

4

Vote or Survey

5

Issues list

Throttling Queries

Chapter 8 began to discuss the support that SharePoint offers for large lists. In particular, you learned that SharePoint allows administrators to throttle the number of items returned in a list view to prevent performance degradation caused by returning an excessive number of items. While these throttle settings apply to views created by users, they also apply to queries executed in custom code.

When executing queries, the number of results returned will be determined by the throttle settings for the given list and the rights of the current user. Throttle settings are set for the Web application in Central Administration. Rights for the current user that affect throttling include administration rights on the Web front-end server, administration rights in the Web application, and auditor rights in the Web application.

In the context of list throttling, users who have server administration rights on the Web front end where the query is run are known as server administrators. Users granted Full Read (auditors) or Full Control (administrators) permissions through the Web application policy in Central Administration are considered super users. Everyone else is termed a normal user.

The List View Threshold is set at the Web application level and specifies the maximum number of items that a database operation can involve at a single time. The default value for this setting is 5,000, which means that results returned from an SPQuery or SPSiteDataQuery object will be generally limited to 5,000 items for both super users and normal users. In addition, the List View Lookup Threshold specifies the maximum number of lookup, person/group, or workflow status fields that can be involved in the query. This value defaults to 6. Server administrators are normally not affected by the List View Threshold or List View Lookup Threshold settings.

Both the SPQuery and SPSiteDataQuery objects have a QueryThrottleMode property that can be set to one of the values in the Microsoft.SharePoint.SPQueryThrottleOption enumeration. The possible values for the property are Default, Override, and Strict. Setting the QueryThrottleMode property to Default causes query throttling to be implemented for both super users and normal users based on the List View Threshold and List View Lookup Threshold settings. Server administrators are not affected.

Setting the QueryThrottleMode property to Override allows super users to return the number of items up to the limit specified in the List View Threshold for the Auditors and Administrators setting so long as the Object Model Override setting is set to “Yes” for the current Web application. Normal users are still limited to returning the number of items specified in the List View Threshold and List View Lookup Threshold settings. Server administrators remain unaffected.

Setting the QueryThrottleMode property to Strict causes the limits specified by the List View Threshold and List View Lookup Threshold settings to apply to all users. In this case, it makes no difference what rights you have in the Web application or server. Example 9-5 shows the RenderContents method from a Web Part with configurable throttling returning query results from a list and demonstrating the concepts that have just been discussed.

Example 9-5. Throttling query results

protected override void RenderContents(HtmlTextWriter writer)
{
    SPWeb site = SPContext.Current.Web;
    SPList list = site.Lists[listName];
    SPUser user = SPContext.Current.Web.CurrentUser;
    SPQuery query = new SPQuery;

    //Throttle settings
    if (overrideThrottling)
        query.QueryThrottleMode = SPQueryThrottleOption.Override;
    else
        query.QueryThrottleMode = SPQueryThrottleOption.Strict;

    //Execute query
    query.Query = "</OrderBy>";
    query.ViewFields = "<FieldRef Name="Title" />";
    SPListItemCollection items = list.GetItems(query);

    //Show user role
    if(user.IsSiteAdmin || user.IsSiteAuditor)
        writer.Write("<p>You are a 'Super User'</p>");
    else
        writer.Write("<p>You are a regular user</p>");

    //Is throttling enabled?
    if(list.EnableThrottling)
        writer.Write("<p>Throttling is enabled</p>");
    else
        writer.Write("<p>Throttling is not enabled</p>");

    //Show count of items returned
    writer.Write("<p>" + items.Count + " items returned.</p>");

}

Regardless of the value set for the QueryThrottleMode property, no results will be throttled if the query is run within the time specified in the Daily Time Window For Large Queries. During this time period, all queries are allowed to run to completion. In addition, the EnableThrottling property of the list may be set to False to remove the list from any and all throttling restrictions. The EnableThrottling property can be set only by someone with Farm Administrator rights using a Microsoft Windows PowerShell script similar to the following.

$site = Get-SPWeb -Identity "http://intranet.wingtip.com/products"
$list = $site.Lists["Toys"]
$list.EnableThrottling = $false

Introducing LINQ

LINQ is a new technology in SharePoint 2010 that acts as an additional layer on top of the existing CAML technology for performing CRUDQ operations. Between the basic object model operations of the SPList and the query operations supported by SPQuery and SPSiteDataQuery, SharePoint seems to have the full range of CRUDQ operations already covered. Why, then, would SharePoint provide a completely different technology for performing CRUDQ operations on lists? The answer is that coding LINQ operations is much easier than coding CAML operations.

To understand why LINQ is easier to use than CAML, consider the CAML query in Example 9-3, in the previous section. While this query accomplishes the task of joining two lists across a Lookup field, it has several weaknesses because the query is text-based. First, you do not know until run time if the query is written correctly. If the query is not correct, then it will simply fail at run time with an obscure error that provides little diagnostic help. Second, the text-based query provides no IntelliSense support at design time. When writing the query, you have no idea what CAML elements are legal in the syntax without having a reference open. Third, the query is difficult to read. You cannot determine easily what the query is doing and what lists are being joined. Finally, the data returned from the query is placed in a SPListItem collection, which does not provide strongly typed business entities. Access to items in the collection and subsequently fields in the items is all performed through text. Once again, you don’t know if it’s correct until run time.

LINQ addresses the problems posed by CAML because it is an object-oriented query language. Because it provides strongly typed objects at design time, you can create queries in code and know that they are correct because the code compiles. In addition, the strongly typed objects provide IntelliSense at design time, which makes it much easier to construct a correct query. Finally, the results are returned from queries as strongly typed objects, so the items and fields also provide IntelliSense and compile-time checking.

The following code shows a simple LINQ query. Even without knowing anything about LINQ, you can see that the query is easier to read. Along with the design experience provided by the strongly typed objects, this makes LINQ a powerful technology for interacting with lists.

var q = from m in dc.Modules
        orderby m.ModuleID
        select new m;

LINQ Overview

The development of LINQ technology began in 2003. The goal of LINQ was to make it easier for developers to interact with Structured Query Language (SQL) and Extensible Markup Language (XML) because the programming models were so different. The result today is a query language that can work across a wide variety of sources, including SQL, XML files, arrays, and SharePoint lists, to name just a few.

While the vision for LINQ is to create a single query language that works regardless of the data source, there are several versions of LINQ today. The System.Linq namespace provides the core LINQ functionality, but it may be supplemented by other namespaces, such as System.Data.Linq for SQL access, System.Data.Entity for the Entity Framework, System.Xml.Linq for XML manipulation, or Microsoft.SharePoint.Linq for SharePoint list operations. In all cases, however, the objective is to write similar syntax regardless of the data source.

Each data source must have a LINQ provider. The LINQ provider for a data source implements the System.Linq.IQueryable interface. This interface is the foundation for the LINQ syntax. If a LINQ provider is not available for a data source, then it cannot be queried with LINQ.

As an example of how different data sources can be accessed with similar syntax, look at the code in Example 9-6. This listing shows access to processes, arrays, XML elements, a database, and a collection of objects. In each case, the LINQ query is very similar and straightforward using a syntax that is similar to a SQL statement.

Example 9-6. LINQ queries against different data sources

//Processes
var q = from p in Process.GetProcesses
        orderby p.WorkingSet64 descending
        select new { p.ProcessName };

//Arrays
string[] names = { "Ted", "Scot", "Andrew"};

var q = from n in names
        orderby n
        select n;

//XML
string xml = "<Contacts>" +
             "<Contact FirstName="Scot" LastName="Hillier" />" +
             "<Contact FirstName="Ted" LastName="Pattison" />" +
             "<Contact FirstName="Andrew" LastName="Connell" />" +
             "</Contacts>";

XDocument d = XDocument.Parse(xml);

var q = from c in d.Descendants("Contact")
        select new { c.Attribute("FirstName").Value };

//SQL
using (MiniCRM dc =
       new MiniCRM("Data Source=(local);Initial Catalog=MiniCRM;Integrated
Security=True;"))
{

    MiniCRM_Names[] q = (from n in dc.MiniCRM_Names
                         select n).ToArray;

}

//Objects
List<Instructor> contacts = new List<Instructor>;
contacts.Add(new Instructor("Scot", "Hillier"));
contacts.Add(new Instructor("Andrew", "Connell"));
contacts.Add(new Instructor("Ted", "Pattison"));

var q = from c in contacts
        select new { c.FirstName };

Understanding LINQ Language Elements

To be able to write the LINQ queries shown in Example 9-6, several enhancements were made to the C# language. These enhancements were all targeted at simplifying and unifying how queries are written. Understanding these enhancements will help you create LINQ queries. The following sections cover the key enhancements made to C# and how they are used in LINQ.

Understanding Generic Collections

Generic collections are critical for managing data in LINQ and provide support for strongly typed object collections. Generic collections were introduced in the Microsoft .NET Framework 2.0 and are contained in the System.Collection.Generics namespace. Using generic collections, you can create lists, dictionaries, stacks, queues, linked lists, and others based on a particular type. For example, the following code creates a list of Customer objects.

List<Customer> customers = new List<Customer>;

Generic collections are vastly superior to traditional collections such as arrays because they are type safe. This means that they can accept only certain object types. Type safety means that the code related to the collection can be checked at compile time instead of run time.

In addition to type safety, generic collections also support enumeration, which is done via the System.Collections.IEnumerable interface. This means that generic collections have support for the foreach syntax, which is critical for enumerating the results returned from a LINQ query. All .NET Framework arrays also implement IEnumerable, which is why LINQ syntax works with standard arrays as well.

Understanding Extension Methods

Extension methods are a language enhancement to C# that allows you to create new methods for a class without subclassing. For example, suppose that you wanted to create a new method for the string class called IsValidEmailAddress that can be used to see if the string contains an email address. To create the extension method, you must create a new static class that contains the extension method as a static method. The following code shows how this is done.

static void Main(string[] args)
{
    //Extension method
    string email = "[email protected]";
    Console.WriteLine(email.IsValidEmailAddress);
}

public static class ExtensionMethod
{
    public static bool IsValidEmailAddress(this string s)
    {
        Regex r = new Regex(@"^[w-.]+@([w-]+.)+[w-]{2,4}$");
        return r.IsMatch(s);
    }
}

Once the extension method is created, it becomes available as a method for the class. This method shows up in IntelliSense as if it were a standard method for the class. Figure 9-1 shows the extension method in Microsoft Visual Studio.

An extension method

Figure 9-1. An extension method

Extension methods are critical to LINQ queries because they enable functionality in the query. The System.Linq namespace defines several extension methods for the IEnumerable interface that you can use right away. Perhaps the most important extension method is the Where method, which enables you to specify the items to return from a query. This method takes as an argument a delegate function to execute. Example 9-7 shows how the delegate function is defined and used in the Where method to find files that were created on the first day of the month.

Example 9-7. The Where extension method

Func<FileInfo, bool> whereMethod = CreatedOnFirstDay;

static void Main(string[] args)
{
    DirectoryInfo di = new DirectoryInfo("C:\");
    var q1 = di.GetFiles.Where(CreatedOnFirstDay);
}

static bool CreatedOnFirstDay(FileInfo info)
{
    return info.CreationTime.Day == 1;
}

Perhaps the most interesting part of the code in Example 9-7 is the syntax used by the Where method. Note that this syntax looks decidedly different from the LINQ queries shown earlier in the chapter. This is because the syntax in Example 9-7 uses method syntax, while the earlier code uses query syntax. Method syntax looks more like traditional code, while query syntax looks more like a SQL statement. Both method syntax and query syntax are equivalent, but query syntax is preferred because it is more readable. The following code shows the equivalent query syntax for finding files created on the first day of the month.

var q4 = from f in di.GetFiles
             where f.CreationTime.Day == 1
             select f;

Understanding Lambda Expressions

When using method syntax in LINQ, you must often create a delegate function. The delegate functions are a bit messy because they must be declared, defined, and then passed as an argument. In C#, you can create a shorthand version of a delegate function in line with the extension method. This shorthand method is known as an anonymous function because the in-line declaration does not have a specific name. The following code shows the equivalent anonymous function code for the method syntax shown in Example 9-7.

var q2 = di.GetFiles.Where(delegate(FileInfo info){
                return info.CreationTime.Day == 1;
                });

While the anonymous function simplifies the method syntax code, C# will let you go even further by using a lambda expression. A lambda expression is a shorthand representation of an anonymous function. The lambda expression uses the lambda operator =>, which is read as “goes to,” to relate the input parameters to the statement block. The following code shows the equivalent lambda expression code for the method syntax shown in Example 9-7. The expression is read as “info goes to info.CreationTime.Day equals 1.”

var q3 = di.GetFiles.Where(info => info.CreationTime.Day == 1);

Understanding Projections

C# introduced several language enhancements that support shorthand ways of creating properties, objects, and collections. Specifically, C# supports automatic properties, shorthand initializers, and anonymous types. Each of these enhancements has a role to play in LINQ.

Automatic properties are a way to define properties in a class without coding member variables. The compiler emits the getter and setter for you. Automatic properties are useful when you do not need to code any logic into the getter and setter methods. The following code shows a simple example of automatic properties in a class.

public class Person {
  public string FirstName { get; set; }
  public string LastName  { get; set; }
  public int    Age       { get; set; }
}

Along with automatic properties, C# also provides shortcuts for object and collection initialization. These shortcuts complement automatic properties and allow you to create whole collections with compact code. The following code shows object and collection initialization using the shorthand code.

//Object Initializer
Contact c= new Contact{ FirstName="Gustavo", LastName="Achong" };
//Collection Initializer
List<Contact> contacts = new List<Contact>{
Contact c= new Contact{ FirstName="Gustavo", LastName="Achong" },
Contact c= new Contact{ FirstName="Cathy", LastName="Abel" },
Contact c= new Contact{ FirstName="Kim", LastName="Aber" },
};

Closely related to automatic properties and initialization is the idea of anonymous types. Anonymous types give a simple way of creating an object with read-only properties without actually having to define the type. The compiler creates the type name, which is not available to the source code. Instead, the new type is referenced using the var keyword. The object is still strongly typed, but the type name is not exposed to the code. The following code shows a simple example of an anonymous type that uses shorthand initialization and the var keyword.

var v= new { FirstName="Gustavo", LastName="Achong" };

The object coding enhancements in C# allow LINQ to perform projections of results into anonymous types. When using projections, LINQ creates a new anonymous type and initializes it with read-only data from the query results. The key to creating the projection is to use the new keyword in the select statement to create an anonymous type with automatic properties. The following code shows a projection creating a new collection of process names.

var q = from p in Process.GetProcesses
        orderby p.WorkingSet64 descending
        select new { Name = p.ProcessName };

Working with LINQ to SharePoint

The LINQ to SharePoint provider is part of the Microsoft.SharePoint.Linq namespace and is used as an additional layer on top of CAML. LINQ queries created with the LINQ to SharePoint provider are translated into CAML queries for execution. While LINQ to SharePoint is not a complete replacement for CAML, it does provide CRUDQ operations for lists. Because of its full support for CRUDQ operations and the inherent advantages of LINQ development over CAML, you will generally use LINQ as your primary interface for working with lists. You will fall back to CAML only when required, such as when overriding throttles or aggregating multiple lists with the SPSiteDataQuery object.

Generating Entities with SPMetal

SharePoint list data is maintained in the content database. This means that the structure of the list and item data is based on relational tables. As a SharePoint developer, however, you do not need to understand the structure of these tables because the object model abstracts the structure into SPList and SPListItem objects. When you write a LINQ to SharePoint query, you should expect the same experience as when using the object model. List and item data should be abstracted so that you do not have to understand the content database schema.

LINQ to SharePoint provides an object layer abstraction on top of the content database through the use of entity classes. Entity classes are lightweight, object-relational interfaces to the list and item data in the content database. In addition, entity classes are used to track changes and provide optimistic concurrency during updates.

Entity classes are created using a command-line utility called SPMetal. SPMetal is located in the SharePoint system directory at C:Program FilesCommon FilesMicrosoft SharedWeb server extensions14Bin. As a best practice, you should update the PATH variable in your environment to include the path to SPMetal. This way, you can simply run the utility immediately after opening a command window.

Generating entity classes with SPMetal can be very simple. At a minimum, you must specify the site for which you want to generate entities and the name of the code file to create. Once the code file is created, you may immediately add it to a project in Visual Studio and start writing LINQ queries. The following code shows an example that will generate entity classes for all the lists and content types in a site.

SPMetal /web:http://intranet.wingtip.com /code:Entities.cs

While generating entity classes can be quite easy, you will likely want more control over which entities are created and how they are structured. SPMetal provides a number of additional arguments that you can use to alter code generation. Table 9-4 lists and describes all of the possible arguments for SPMetal.

Table 9-4. SPMetal Arguments

Argument

Description

/code:<filename>

Specifies the name of the generated file.

/language:<language>

Specifies the language for the generated code. Can be either csharp or vb.

/namespace:<namespace>

Specifies the namespace for the generated code.

/parameters:<file>

Specifies an XML file with detailed code-generation parameters.

/password:<password>

Specifies credentials to use for data access during the code-generation process.

/serialization:<type>

Specifies the serialization type. Can be either none or unidirectional.

/user:<username>

Specifies credentials to use for data access during the code-generation process.

/useremoteapi

Specifies that the generation of entity classes is to be done for a remote SharePoint site, such as SharePoint Online.

/web:<url>

The URL of the SharePoint site for which entities will be generated.

If you examine the code file generated by SPMetal, you will see that there are two kinds of classes are created. First, a single class is created that inherits from Microsoft.SharePoint.Linq.DataContext. The DataContext class provides a connection to lists and change tracking for operations. You can think of the DataContext class as serving a purpose similar to the SqlConnection class in data access code. Second, multiple classes are generated that represent the various content types used by the lists on the site. Using the DataContextclass together with the entity classes allows you to write LINQ queries. Example 9-8 shows a simple LINQ query written to return all training modules contained in the Modules list using a DataContext class named Entities.

Example 9-8. A simple LINQ to SharePoint query

using (Entities dc = new Entities(SPContext.Current.Web.Url))
{
    var q = from m in dc.Modules
            orderby m.Title
            select m;

    foreach (var module in q)
    {
        moduleList.Items.Add(module.Title);
    }
}

Understanding the DataContext Class

Before performing any LINQ operations, you must connect to a site using the DataContext object. The DataContext accepts a URL in the constructor so that you can specify the site where it should connect, which is useful as you move your code from development to production. Of course, the site you specify must actually have the lists and content types for which entities have been generated. Otherwise, your operations will fail. The DataContext class also implements IDisposable so that it can be coded with a using block.

The DataContext class provides a GetList<T> method that provides access to each list for which an entity has been generated. You can use this method in LINQ query syntax to specify easily the list against which the query should be run. Along with the method, the DataContext also has a property of EntityList<T> for each list. Example 9-8 showed an example using dc.Modules in the LINQ query.

The Log property can be used for viewing the underlying CAML created from the LINQ query. This is useful for not only monitoring and debugging, but it also can be used to help create CAML queries for the SPQuery and SPSiteDataQuery objects. The Log property accepts a System.IO.TextWriter object so you can write the log to a file or output it in a Web Part easily.

The DataContext class will track changes made to the entity objects so that they can be written back to the content database. The ObjectTrackingEnabled property determines whether the DataContext will track changes. The property defaults to True, but setting it to False will improve performance for read-only operations. If the DataContext is tracking changes, then the content database may be updated by calling the SubmitChanges method. A detailed discussion appears in the section entitled Adding, Deleting, and Updating with LINQ to SharePoint later in this chapter.

Using Parameters.xml to Control Code Generation

The arguments accepted by SPMetal provide a fair amount of control over the entity-generation process, but in practice, you will likely want even more control. The highest level of control over entity generation is given by passing a parameters.xml file to SPMetal with detailed information about the entities to generate.

The parameters.xml file contains elements that give SPMetal specific details about code generation. In particular, it specifies what lists, content types, and fields should be generated in code. The parameters.xml file is passed to SPMetal through the /parameters argument. The following code shows a sample parameters.xml file.

<?xml version="1.0" encoding="utf-8"?>
<Web Class="Entities" AccessModifier="Public"
  xmlns="http://schemas.microsoft.com/SharePoint/2009/spmetal" >
  <List Name="Instructors" Member="Instructors">
    <ContentType Name="Contact" Class="Instructor">
      <Column Name="FullName" Member="FullName"/>
      <ExcludeOtherColumns/>
    </ContentType>
  </List>
  <List Name="Modules" Member="Modules" />
  <ExcludeOtherLists/>
</Web>

The Web element is the root of the schema. The Class attribute specifies the name of the DataContext class to generate, and the AccessModifier attribute specifies the access level to the class. The List element is a child of the Web element and specifies the name of a list for which entities should be generated. The Member attribute specifies the name of the property in DataContext that will represent this list. The ContentType element is a child of the List element and specifies a content type for which an entity should be generated. The Class attribute specifies the name of the generated class. The Column element is a child of the ContentType element and specifies a column that should be included in the generated entity. The ExcludeOtherColumns, ExcludeOtherContentTypes, and ExcludeOtherLists elements are used to stop looking for items to include in entity generation. In this way, you can specify the exact set of lists, content types, and columns to include in the generated entities. This is very useful for excluding list, content types, and columns that are present in the development environment, but will not be present in the production environment. Table 9-5 shows the complete schema for the parameters.xml file.

Table 9-5. Parameters.xml Schema

Element

Child Elements

Attribute

Description

Web

List

ExcudeList

ExcludeOtherLists

IncludeHiddenLists

ContentType

ExcludeContentType

ExcludeOtherContentTypes

IncludeHiddenContentType

Class (optional)

Name of DataContext class

AccessModifier (optional)

Specifies accessibility of DataContext and entity classes. May be Internal or Public.

List

ContentType

ExcludeContentType

Name

Name of the list in SharePoint

Member (optional)

Name of the DataContext property representing the list

Type (optional)

Type of the DataContext property representing the list

ContentType

Column

ExcludeColumn

ExcludeOtherColumns

IncludeHiddenColumns

Name

Name of the content type

Class (optional)

Name of the generated class

AccessModifier (optional)

Accessibility of the generated class

Column

N/A

Name

Name of the column

Member (optional)

Name of the generated property for the column

Type (optional)

Type of the generated property for the column

ExcludeColumn

N/A

Name

Name of the column to exclude from entity generation

ExcludeOtherColumns

N/A

N/A

Excludes all columns not explicitly included

IncludeHiddenColumns

N/A

N/A

Includes hidden columns in entity generation

ExcludeList

N/A

Name

Name of the list to exclude from entity generation

ExcludeOtherLists

N/A

N/A

Excludes all lists not explicitly included

IncludeHiddenLists

N/A

N/A

Includes hidden lists in entity generation

ExcludeContentType

N/A

Name

Name of content type to exclude from entity generation

ExcludeOtherContentTypes

N/A

N/A

Excludes all content types not explicitly included

IncludeHiddenContentTypes

N/A

N/A

Includes hidden content types in entity generation

Querying with LINQ to SharePoint

Once you have entities generated, then you can begin to write LINQ to SharePoint queries. Writing LINQ to SharePoint queries is very similar to writing LINQ queries for other data sources. You formulate a query using query syntax, receive the results into an anonymous type and then use the IEnumerable interface to iterate over the results. Example 9-4 earlier in the chapter shows a simple example.

LINQ to SharePoint also supports querying across lists that are joined by a Lookup field. Example 9-3, earlier in this chapter, showed how this is possible with CAML, but LINQ to SharePoint makes the syntax much simpler. The following code shows the equivalent LINQ query for the CAML shown in Example 9-3. Note how the join is done simply by using the dot operator to move easily from the Modules list to the joined Instructors list.

var q = from m in dc.Modules
        orderby m.ModuleID
        select new { m.Title, Presenter = m.Instructor.FullName, Email = m.Instructor.Email};

Not only does this code join two lists together, but it is also using a projection. The new keyword is creating a new set of anonymous objects whose field names have been set to Title, Presenter, and Email.

LINQ to SharePoint also allows you to perform query composition. Query composition is the ability to run a LINQ query on the results of a LINQ query. For example, the following code shows how to run a new query specifically looking for a training module named “Visual Studio 2010.”

var q1 = from m1 in dc.Modules
        orderby m1.ModuleID
        select new {
          m1.Title, Presenter = m1.Instructor.FullName,
          Email = m1.Instructor.Email
        };
var q2 = from m2 in q1
        where m2.Title.Equals("Visual Studio 2010")
        select m2;

Finally, LINQ to SharePoint supports a number of extension methods that you can use for aggregation, grouping, and returning specific entities. The methods are often used on the results of the query. The following code, for example, shows how to return the total number of training modules in the query results. The most commonly used extension methods are listed in Table 9-6.

var t = (from m in dc.Modules
         select m).Count;

Table 9-6. Commonly Used Extension Methods

Method

Description

Any

Returns true if there are any items in the query results.

Average

Returns the aggregated average value.

Count

Returns the count of items in the query result.

First

Returns the first item in the results. This is useful if you are expecting a single result.

FirstOrDefault

Returns the first item in the results. If there is no first item, it returns the default for the object type.

Max, Min

Return the item with the maximum or minimum value.

Skip

Skips a certain number of items in the results. This is useful when used with Take for paging.

Sum

Returns the aggregated sum.

Take

Allows you to return only a specified number of results. Useful when used with Skip for paging.

ToList

Returns the query results into a generic List<T>.

Adding, Deleting, and Updating with LINQ to SharePoint

Along with queries, you can also add, delete and update lists with LINQ to SharePoint. Adding and deleting items are accomplished using methods associated with the EntityList<T> property of the DataContext. The InsertOnSubmit method adds a single new item to a list; the InsertAllOnSubmit method adds a collection of new items to a list; the DeleteOnSubmit method deletes a single item from a list; and the DeleteAllOnSubmit method deletes a collection of items from a list. The RecycleOnSubmit method puts a single item into the Recycle Bin, and the RecycleAllOnSubmit method puts a collection of items in the Recycle Bin. The following code shows an example of adding a new item to the Modules list.

using (Entities dc = new Entities(SPContext.Current.Web.Url))
{
    ModulesItem mi = new ModulesItem;
    mi.Title = "LINQ to SharePoint";
    mi.Id = 301;
    dc.Modules.InsertonSubmit(mi);
    dc.SubmitChanges();
}

Updating items in lists is done by simply changing the property values in the item and then calling the SubmitChanges method of the DataContext. The following code shows a simple example of an update operation.

using (Entities dc = new Entities(SPContext.Current.Web.Url))
{
    var q = (from m in dc.Modules
            where m.Id==1
            select m).First;

    q.Title = "Revised Title for Module 1";
    dc.SubmitChanges();
}

When updating items, LINQ to SharePoint uses optimistic concurrency. The provider will check to see whether the items in the list have been changed since your LINQ query was run before it will attempt to update them. If a discrepancy is found for any of the submitted entities, then no changes are committed. All discrepancies must be resolved before any change in the current batch can be committed.

When discrepancies are found during the update process, LINQ to SharePoint throws a Microsoft.SharePoint.Linq.ChangeConflictException. In addition, the ChangeConflicts collection of the DataContext is populated with ObjectChangeConflict objects that contain data about fields in the item that are causing conflicts. The SubmitChanges method supports overloads that allow you to specify whether update attempts should continue after the first conflict or whether update attempts should stop. The ChangeConflicts collection will be populated only with information about failed attempts, so electing to stop after the first failure will not provide complete data on all conflicts. Regardless of whether or not you continue update attempts, remember that no changes will be saved if any conflict occurs. The purpose of continuing update attempts is to populate the ChangeConflicts collection completely.

The ChangeConflicts collection contains a MemberConflicts collection, which has detailed information about the actual values causing the conflict. In particular, the MemberConflicts collection is populated with MemberChangeConflict objects, each of which has OriginalValue, CurrentValue, and DatabaseValue properties. OriginalValue is the value of the column when the LINQ query was run. CurrentValue is the value that SubmitChanges is attempting to write to the content database. DatabaseValue is the current value of the column in the database. Trapping ChangeConflictException and using the MemberChangeConflict objects allows you to display the conflicts to the user. The code in Example 9-9 shows how to iterate the collection, build a list, and bind the list to a grid for display.

Example 9-9. A simple LINQ to SharePoint query

Try
{
    //Update code
}
catch (Microsoft.SharePoint.Linq.ChangeConflictException x)
{
    conflicts = new List<Conflict>;
    foreach (ObjectChangeConflict cc in dc.ChangeConflicts)
    {
        foreach (MemberChangeConflict mc in cc.MemberConflicts)
        {
            Conflict conflict = new Conflict;
            conflict.OriginalValue = mc.OriginalValue.ToString;
            conflict.CurrentValue = mc.CurrentValue.ToString;
            conflict.DatabaseValue = mc.DatabaseValue.ToString;
            conflicts.Add(conflict);
        }

    }
conflictGrid.DataSource = conflicts;
conflictGrid.DataBind;
}

Along with displaying the results, you can also resolve conflicts in code. After displaying the results to users in a grid, you can allow them to select whether the pending changes should be forced or lost. The Resolve method of the MemberChangeConflict class accepts a Microsoft.SharePoint.Linq.RefreshMode enumeration, which can have a value of KeepChanges, KeepCurrentValues, or OverwriteCurrentValues. KeepChanges accepts every pending change but gives the highest priority to the current user. KeepCurrentValues keeps only the changes made by the current user and loses all other changes. OverwriteCurrentValues loses the current changes and sets the values to what is in the database. After calling the Resolve method, you must call the SubmitChanges method again to complete the operation. The following code shows an example of keeping the current changes and losing all other changes.

foreach (ObjectChangeConflict cc in dc.ChangeConflicts)
{
    foreach (MemberChangeConflict mc in cc.MemberConflicts)
    {
        mc.Resolve(RefreshMode.KeepCurrentValues);
    }
}
dc.SubmitChanges;

Working with Document Libraries

So far in this chapter, the focus has been on list items that do not have associated files. However, document libraries represent a significant part of the SharePoint infrastructure, and you will often want to perform operations on libraries, files, and folders as part of your custom solutions. The good news is that a document library is really just a specialized type of list so everything you know about CAML and LINQ applies to document libraries. You only need to supplement your knowledge to deal specifically with the files that these libraries contain.

Using the Basic Library Object Model

Because a document library is a specialized list, you will find that the SPList object can be used to access document libraries as well as lists. This means that document libraries appear in the Lists collection of SPWeb objects just like any other list. The server-side object model supplements the functionality offered by the SPList through the Microsoft.SharePoint.SPDocumentLibrary object, which inherits from SPList and provides the additional functionality necessary to work with document libraries. If you have a reference to an SPList object, you can use the following code to see if the list is actually a document library.

public bool IsListAlsoDocumentLibrary(SPList list) {
  if (list is SPDocumentLibrary)
    return true;
  else
    return false;
}

Once you have a reference to a document library, you may access it through either the SPList object or the SPDocumentLibrary object. In many cases, you will start by working with the SPList object to access the individual SPListItem objects in the library and their associated metadata. You can subsequently use the SPListItem object to access the document associated with the item through the File property. The File property returns a Microsoft.SharePoint.SPFile object. The SPFile object may be used to manage the document and read and write content. The following code shows how to access the SPFile object from the SPListItem object and manipulate the file using the object model.

foreach (SPListItem item in docLib.Items)
{
  if (item.FileSystemObjectType == SPFileSystemObjectType.File) {
    SPFile file = item.File;
    //Check on number of versions
    int versionCount = file.Versions.Count;
    //Determine when document was checked out
    DateTime checkedOutDate = file.CheckedOutDate;
    //Open document for stream-based access
    using(Stream fileContents = file.OpenBinaryStream) {
      // program against stream to access document content
    }
  }
}

Along with documents, many libraries will also contain folders. Folders in a document library are stored as SPListItem objects, but they do not return files from the File property. This structure can be confusing because iterating over the items in a collection will simply return all items and folders in a flat table. If you instead want to access the documents in a particular folder, then you must use the Microsoft.SharePoint.SPFolder object.

The SPDocumentLibrary object has a RootFolder property, which in turn has a Files collection. The RootFolder property returns a SPFolder object, and the Files collection returns all the files that are in the root of the library. A SubFolders property on the SPFolder object allows you to work through the complete hierarchy of folders in the library. The code in Example 9-10 shows how to build a tree view of a document library recursively based on the underlying folder structure.

Example 9-10. Building a hierarchy of documents from a library

protected override void OnLoad(EventArgs e) {
  SPWeb site = SPContext.Current.Web;
  foreach (SPList list in site.Lists) {
    if (list is SPDocumentLibrary && !list.Hidden) {
      SPDocumentLibrary docLib = (SPDocumentLibrary)list;
      SPFolder folder = docLib.RootFolder;
      TreeNode docLibNode = new TreeNode(docLib.Title,
                                               docLib.DefaultViewUrl,
                                               @"\_layoutsimagesITDLSM.GIF");
      LoadFolderNodes(folder, docLibNode);
      treeSitesFiles.Nodes.Add(docLibNode);
    }
  }
  treeSitesFiles.ExpandDepth = 1;
}

protected void LoadFolderNodes(SPFolder folder, TreeNode folderNode) {
  foreach (SPFolder childFolder in folder.SubFolders) {
    if (childFolder.Name != "Forms") {
      TreeNode childFolderNode = new TreeNode(childFolder.Name,
                                                    childFolder.Name,
                                                    @"\_layoutsimagesFOLDER.GIF");

      LoadFolderNodes(childFolder, childFolderNode);
      folderNode.ChildNodes.Add(childFolderNode);
    }
  }
  foreach (SPFile file in folder.Files) {
    TreeNode fileNode;
    fileNode = new TreeNode(file.Name, file.Name, @"\_layoutsimagesICGEN.GIF");
    folderNode.ChildNodes.Add(fileNode);
  }
}

Along with managing metadata and viewing files, you will also want to add new files to document libraries. When adding new files to document libraries, it is important to recognize that the library is really just an abstraction on top of the SharePoint website. What this means is that when you add files to a document library, you are really just adding documents to a specific location on the SharePoint site. Therefore, the code for adding documents is found in the SPWeb object, not the SPDocumentLibrary object.

The SPWeb object has a Files property, which returns the SPFileCollection of files in the root of the site. The SPFileCollection object provides Add and Delete methods, which can be used to manage files on the site. The Add method has several overloads, but generally you will need the URL for the library and a Stream object for the file. Example 9-11 shows how to upload a simple text file to the root folder of the Shared Documents document library.

Example 9-11. Uploading a file

//Create stream and add document content
Stream documentStream = new MemoryStream;
StreamWriter writer = new StreamWriter(documentStream);
writer.Write("Some content for my simple text document");
writer.Flush;

//Add document into document library
string fileName = "MyFile.txt";
string url = "http://intranet.wingtip.com/Shared%20Documents/";
site.Files.Add(url + filename, documentStream, true);

A separate overload of the Add method can be used to add new metadata for a document when it is uploaded. When you use this method, you must create a Hashtable containing the metadata values assigned to properties of the library. Example 9-12 shows a complete file upload operation that sets metadata values.

Example 9-12. Uploading a file with metadata

//Create stream and add document content
Stream documentStream = new MemoryStream;
StreamWriter writer = new StreamWriter(documentStream);
writer.Write("Some content for my simple text document");
writer.Flush;

string fileName = "MyFile.txt";
string url = "http://intranet.wingtip.com/";

//Set metadata
Hashtable docProperties = new Hashtable;
docProperties["vti_title"] = "My Text File";

//Add document into document library
site.Files.Add(url + filename, documentStream, docProperties, true);

Note that in Example 9-12, the names of the document properties do not correspond to the display names of the metadata for the library. The names of the file properties must be the actual names maintained by the document itself. You can list all the metadata properties for a document using the following code.

SPList list = site.Lists["Shared Documents"];
foreach (SPListItem item in list.Items) {
  SPFile file = item.File;
  foreach (DictionaryEntry entry in file.Properties) {
    Console.WriteLine(entry.Key + ": " + entry.Value);
  }
  break;
}

Working with Open XML

When you are working with documents in SharePoint 2010, you will quite often be dealing with Microsoft Office documents. Therefore, it is important to understand how to create, modify, and manage these document types in your custom solutions. With Office 2007 and later, the file formats for all Office documents are based on an XML standard designed to be more extensible and interoperable than previous formats.

The idea behind the XML document specification is to create an open standard for document definitions so that they may be shared between tools and transformed between systems easily. For SharePoint developers, the Open XML formats are a real advantage because they allow the creation and manipulation of documents through standard XML programming techniques. Prior to Office 2007, the document formats were based on a proprietary binary format that was not documented. From a development perspective, the only way to manipulate a document was to use the object model through automation. This meant starting Microsoft Word, Microsoft Excel, or Microsoft PowerPoint programmatically and then executing operations against it to create documents or slideshows, import data, or print reports.

The problem with automating Word, Excel, and PowerPoint is that they were never intended to function as server products, but they often were deployed on servers to centralize the automation. Because the products were not server-ready, instances of Word, Excel, and PowerPoint often did not shut down properly. In many cases, these solutions required periodic rebooting of the server just to clear the stalled instances of the software. Fortunately, this kind of deployment is no longer necessary.

Understanding Document Packages

Before writing custom solutions against the open XML formats in Office 2010, you should understand the basic structure of an Office document. For example, consider the structure of a simple Word 2010 document. Office Open XML file formats are based on standard .zip file technology. Each top-level file is saved as a .zip archive, which means you can rename the file with a .zip extension and examine the contents.

The Word, Excel, or PowerPoint file is known as a package in Open XML parlance. Because a package is implemented as a standard .zip archive, it automatically provides compression and makes its contents instantly accessible to many existing utilities and application programming interfaces (APIs) on Windows platforms and non-Windows platforms alike.

Inside a package are two kinds of internal components: parts and items. In general, parts contain content and items contain metadata describing the parts. Items can be subdivided further into relationship items and content-type items.

A part is an internal component containing content that is persisted inside the package. The majority of parts are simple text files serialized as XML with an associated XML schema. However, parts can also be serialized as binary data when necessary, such as when a Word document contains a graphic image or media file.

A part is named by using a Uniform Resource Identifier (URI) that contains its relative path within the package file combined with the part file name. For example, the main part within the package for a Word document is named /word/document.xml.

The Open XML file formats use relationships to define associations between a source and a target part. A package relationship defines an association between the top-level package and a part. A part relationship defines an association between a parent part and a child part. Relationships are important because they make these associations discoverable without examining the content within the parts in question. Relationships are independent of content-specific schemas and are, therefore, faster to resolve. An additional benefit is that you can establish a relationship between two parts without modifying either of them.

Relationships are defined in internal components known as relationship items. A relationship item is stored inside the package just like a part, although a relationship item is not actually considered a part. For consistency, relationship items are always created inside folders named _rels.

For example, a package contains exactly one package relationship item named /_rels/.rels. The package relationship item contains XML elements to define package relationships, such as the one between the top-level package for a .docx file and the internal part /word/document.xml, as shown in the following code.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="../package/2006/relationships ">
  <Relationship Id="rId1"
                Type="../officeDocument/2006/relationships/officeDocument"
                Target="word/document.xml"/>
</Relationships>

A Relationship element defines a name, type, and target part. Furthermore, the type name for a relationship is defined by using the same conventions used to create XML namespaces. In addition to a single package relationship item, a package can also contain one or more part relationship items. For example, you can define relationships between /word/document.xml and child parts inside a package relationship item located at the URI /word/_rels/document.xml.rels. Note that the Target attribute for a relationship in a part relationship item is a URI relative to the parent part, not the top-level package.

Every part inside a package is defined in terms of a specific content type. Don’t confuse these content types with a Content Type in SharePoint because they are completely different. A content type within a package is metadata that defines a part’s media type, a subtype, and a set of optional parameters. Any content type used within a package must be explicitly defined inside a component known as a content type item. Each package has exactly one content type item, which is named /[Content_Types].xml. The following is an example of content type definitions inside the /[Content_Types].xml item of a typical Word document.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
  <Default
     Extension="rels"
     ContentType="application/vnd.openxmlformats-
                  package.relationships+xml"/>
  <Default
     Extension="xml"
     ContentType="application/xml"/>
  <Override
   PartName="/word/document.xml"
   ContentType="application/vnd.openxmlformats-
                officedocument.wordprocessingml.document.main+xml "/>
</Types>

Content types are used by the consumer of a package to interpret how to read and render the content within its parts. A default content type is typically associated with a file extension, such as .rels or .xml. Override content types are used to define a specific part in terms of a content type that differs from the default content type associated with its file extension. For example, /word/document.xml is associated with an Override content type that differs from the default content type used for files with an .xml extension.

Using the Open XML SDK 2.0

While understanding the package structure is important, creating solutions that read and write XML directly to the Open XML file formats is tedious and prone to error. When Open XML first emerged, this was exactly how solutions were created. Fortunately, a new set of libraries that provides strongly typed objects for manipulating documents is now available for use with Open XML. To use these documents, you must download and install the Open XML software development kit (SDK), version 2.0.

Once you have installed the Open XML 2.0 SDK, you will be able to set a reference to the assembly DocumentFormat.OpenXML.dll file. This assembly contains more than 50 different namespaces that represent a tremendous amount of functionality crossing all types of Office documents. A complete presentation of Open XML is well beyond the scope of this chapter; the goal here is just to present the basics so that you can see how it fits into your SharePoint solutions.

As a simple example, Example 9-13 shows how to create a simple Word document using the DocumentFormat.OpenXML namespace. Within this listing, there are several things to note. First, along with setting a reference to DocumentFormat.OpenXML.dll, you must also set a reference to WindowsBase.dll, which is part of Windows Presentation Foundation (WPF) and contains supporting functionalities. Second, notice how the strongly typed objects provided by the Open XML SDK allow you to easily create a document and know at compile time that it is valid. Third, note how the document content is created and then packaged into a compound document at the end, which results in the actual file.

Example 9-13. Creating a document with the Open XML SDK

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace SimpleDocumentCreation
{
    class Program
    {
        static void Main(string[] args)
        {
            using (WordprocessingDocument package =
                WordprocessingDocument.Create(
                "C:\HelloOpenXML.docx",
                WordprocessingDocumentType.Document))
            {
                //Create content
                Body body = new Body(
                    new Paragraph(
                        new Run(
                            new Text("Hello, Open XML SDK!"))));

                //Create package
                package.AddMainDocumentPart;
                package.MainDocumentPart.Document = new Document(body);
                package.MainDocumentPart.Document.Save;
                package.Close;
            }
        }
    }
}

While strongly typed objects make document creation easy, they also make reading documents easy because they support LINQ. LINQ support means that you can write queries against the contents of a document and operate on the results for reading and updating. As an example, the code in Example 9-14 shows how to open a Word document, find all the text items, and print them through a console application.

Example 9-14. Reading a document with LINQ

using System;
using System.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace SimpleDocumentRead
{
    class Program
    {
        static void Main(string[] args)
        {
            using (WordprocessingDocument package =
                WordprocessingDocument.Open(args[0], false))
            {
                var q = from t in package.MainDocumentPart.Document.Descendants<Text>
                        select t;
                foreach (var i in q)
                {
                    Console.WriteLine(i.Text);
                }
            }
        }
    }
}

Conclusion

SharePoint supports several technologies for performing CRUDQ operations against lists in server-side code. For simple, low-volume operations, the object model support for lists and items is appropriate. For more complete operations, LINQ is an excellent choice and will likely be the preferred method to use. In some advanced scenarios, however, CAML is still required to achieve the desired results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset