Tuesday 5 January 2010

Strip and manipulate a URL by breaking it into segments (.NET 2.0)

I recently required a method that would take a string that contain a URL (href), and another string that contained a root section of this URL (root), and return a string that contained the remained of the URL (i.e. the section of href that remained once the root had been removed).

To make matters more complicated, the href parameter could have some unusual features.  Because this URL was pointing to content created by users in a Content Management System (CMS), some segments of the URL contained trailing or leading white space (segments being the bits of the URL between the slashes).   This whitespace is fine in the CMS system, but my method must strip this whitespace to return a canonical URL.

Fortunately .Net 2.0 onwards provides us with the URI class.  This has lots of fabulous methods and properties, but in this example I shall use it to:
  1. turn the parameter strings "root" and "href" URLs into canonical URIs
  2. break down the "href" parameter into segments, 
  3. ignore the segments that exist in the "root" parameter,
  4. strip leading and trailing whitespace from the remaining segments
  5. return the canonicalised section of the URL


string GetRootStrippedURI(string root, string href)
{
    Uri fileUri = new Uri(Uri.UnescapeDataString(href));
    Uri rootUri = new Uri(Uri.UnescapeDataString(root));

    // Create the return string from the root
    string strippedExtension = "";

    // Loop through segments not in the root and clean them up
    for (int i = rootUri.Segments.Length; i < fileUri.Segments.Length - 1; i++)
    {
        strippedExtension += fileUri.Segments[i].TrimEnd().TrimStart();
    }
    return strippedExtension;      
}

No comments:

Post a Comment