PowerShell Script to Remove SharePoint 2010 or 2013 List Duplicates

<Updated 2013-05-11>

   Recently Derek, a fellow Premier Field Engineer (PFE), had a customer request on how to remove duplicates from a SharePoint 2010 list.  I have written some PowerShell scripts that remove list items, but never to check for duplicates.  My main concern for this was performance as looping through a list searching for duplicates could be an expensive operation.

 

Problem

   As it turns out the customer had a list with more than 85,000 items and many of the list items were duplicates.  The original solution the custom attempted took 5 hours to run and threw out of memory exception most times it was run.  My assumptions on the resource intensive list queries and looping were correct.

 

Solution

   I first wrote my own version of a looping structure to iterate over the items and remove duplicates while trying to be as efficient with memory and CPU usage as possible.  After about 10 minutes of scripting I wasn’t making good progress so I decided to switch up my approach.

   A quick Bing search pulled up the following article on checking for duplicates items in a SharePoint list.  The key piece of info from that article was not to loop through the items one at a time, but instead to convert the list into a DataTable using the method SPListItemCollection.GetDataTable() and then group the items on the column to compare for duplicates.  In this case the Title column was used for grouping.  With a little trickery I was then able to find the IDs of the duplicate items to then be used to delete the individual duplicate items.

   Two added added bonuses are included in the below script.  The first is that the script writes out the progress of the deletion process using Write-Progress.  The second is that the commented portion at the bottom will generate list items with random single letter titles to test out the duplicate deletions.

<Update 2013-05-11>

   Thank you to reader Santosh for pointing out an error in the script I published.  I had mistakenly published a version that included an invalid reference.  On line 10 in the foreach block (%) the call to $list.DeleteByItemID() will fail because that method doesn’t exist on that object.  I’ve updated the downloadable script and below sample to the original I used for $list.GetItemById($_.ID).Delete().  Santosh also got it to work by calling $list.Items.DeleteItemById().  I prefer not to call the Items member on an SPList object because of the performance impact which you can read about in my previous post.

</Update 2013-05-11>

Add-PSSnapin microsoft.sharepoint.powershell 
$web = Get-SPWeb -Identity “<URL of Site>” 
$list = $web.Lists[“DuplicatesList”] 

$AllDuplicates = $list.Items.GetDataTable() | Group-Object title | where {$_.count -gt 1} 
$count = 1 
$max = $AllDuplicates.Count 
foreach($duplicate in $AllDuplicates) 
{ 
$duplicate.group | Select-Object -Skip 1 | % {$list.GetItemById($_.ID).Delete()} 
Write-Progress -PercentComplete ($count / $max * 100) -Activity “$count duplicates removed” -Status “In Progress” 
$count++ 
} 

   Be sure to modify the site URL and list name to work in your environment.  In my example the list is named DuplicatesList and must already exist.

 

Conclusion

   With my current solution I was able to remove 75,000 duplicates in a list with a single column (Title) in roughly 45 minutes.  Derek made some tweaks and was able to remove 45,000 duplicates in about 3.5 hours on a list with dozens of columns.  It appears that a list with additional columns takes longer to process.  If I had additional time I would investigate the SPList.GetDataTable() method which allows passing in a query to only retrieve specific columns (i.e. the column we are using to determine duplicates).  I believe this might speed up the process but don’t know for sure.

   As it stands I was very happy with the solution I came up even if it did take a long time to run.  The important part was that it didn’t throw out of memory exceptions while running and did have increased performance vs. the customer’s original solution.  Thanks to Derek for collaborating on this item with me.  It really is great to work with other PFEs who are passionate about helping customers solve real problems in their environments.  I hope this script and / or post helps someone else in their environment.

 

      -Frog Out

 

Links

Check for duplicate items in Sharepoint lists with Powershell

http://jespermchristensen.wordpress.com/2008/09/22/check-for-duplicate-items-in-sharepoint-lists-with-powershell/

 

SPListItemCollection.GetDataTable() method

http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.splistitemcollection.getdatatable.aspx

 

SPList.GetDataTable() method

http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.splist.getdatatable(v=office.14).aspx

PowerShell Script to Create a Large SharePoint List with Random Data

<Updated 2013-02-06>

   Recently a fellow SQL PFE Lisa Gardner asked me for a PowerShell script that would create a SharePoint list with numerous (100+) columns.  She was doing some research for an upcoming internal presentation on SharePoint databases for SQL database administrators and wanted samples of really nasty SQL queries that are generated when querying a large SharePoint list.  The script that I came up with is below in the Solution section.

<Update 2013-02-06>

Lisa posted her findings on the SQL query generated from the script I provided.  You can read about it on her blog post here.

</Update>

   Before I get to my script though I did want to call out sample scripts that two other fellow SharePoint PFEs have created that are helpful for working with SharePoint lists.  Kirk Evans wrote a post on creating a SharePoint list with multiple folders and subfoldersJosh Gavant has written a custom module with commandlets that return SPList objects along with column metadata.  Feel free to take a look at these for additional insight in working with SharePoint lists and items.

 

Problem

   Create a SharePoint 2010 (or 2013) custom list that has a configurable number of columns with various datatypes.  Also populate these columns with random sample data.

 

Solution

   The only assumption for this script is that you already have a SharePoint site to work with and have modified the variables at the header to suit your needs.  If you do not have a site created there is a New-SPWeb commandlet call that you can uncomment early in the script to create a site if needed.  The list, columns, and sample data will be generated by the script.  Be sure to fill in the URL of the “big list” site to be used.

 

Note: When working with SharePoint lists it is important to be aware of SharePoint column limits and what is known as SQL row wrapping.  Read the information on the following TechNet article for more information about these: http://technet.microsoft.com/en-us/library/cc262787(v=office.14).aspx#Column

 

   Here is the script that I came up with.  You can also download the script from my SkyDrive folder.

########################## 
# Date: Feb 5, 2013 
# Author: Brian T. Jackett 
########################## 

# set number of columns to create for various data types 
# column limits courtesy of http://technet.microsoft.com/en-us/library/cc262787(v=office.14).aspx#Column 
$IntColumnsToCreate = 30               # max of 96 
$BoolColumnsToCreate = 30              # max of 96 
$ChoiceColumnsToCreate = 0             # max of 276 
$SingleLineTextColumnsToCreate = 70    # max of 276 
$MultiLineTextColumnsToCreate = 42     # max of 192 
$DateTimeColumnsToCreate = 10          # max of 48 
$CurrencyColumnsToCreate = 40          # max of 72 
$LookupColumnsToCreate = 20            # max of 96 
# set number of items to create 
$ItemsToCreate = 10 
# URL of site that will contain big list 
$BigListSiteURL = “<set a value for this>” 
if((Get-PSSnapin “Microsoft.SharePoint.PowerShell”) -eq $null) 
{ 
    Add-PSSnapin Microsoft.SharePoint.PowerShell 
} 
Start-SPAssignment -Global 
# uncomment the New-SPWeb command to create a new site if necessary 
#New-SPWeb -Url http://sps2013app/sites/demo/BigList -Template “STS#0” 
# get reference to the “big list” site 
$web = Get-SPWeb -Identity $BigListSiteURL 
# if creating lookup columns and lookup list does not exist then create it 
if($LookupColumnsToCreate -gt 0 -and $web.Lists[“MyLookupList”] -ne $null) 
{ 
    $web.Lists[“MyLookupList”].Delete() 
} 
# if big list exists delete it 
if($web.Lists[“MyBigList”] -ne $null) 
{ 
    $web.Lists[“MyBigList”].Delete() 
} 
# if creating lookup columns create list and get reference to list 
if($LookupColumnsToCreate -gt 0) 
{ 
    $web.Lists.Add(“MyLookupList”, “My Lookup List”, [microsoft.sharepoint.splisttemplatetype]::GenericList) 
    $lookuplist = $web.Lists[“MyLookupList”] 
    $lookupItem = $lookuplist.Items.Add() 
    $lookupItem[“Title”] = “The one lookup item to rule them all” 
    $lookupItem.Update() 
} 
# create list and get reference to list 
$web.Lists.Add(“MyBigList”, “My Big List”, [microsoft.sharepoint.splisttemplatetype]::GenericList) 
$list = $web.Lists[“MyBigList”] 
# add integer columns 
for($count = 1; $count -le $IntColumnsToCreate; $count++) 
{$list.Fields.Add(“Int$count”, [microsoft.sharepoint.SPFieldType]::Integer, $false)} 
# add boolean columns 
for($count = 1; $count -le $BoolColumnsToCreate; $count++) 
{$list.Fields.Add(“Bool$count”, [microsoft.sharepoint.SPFieldType]::Boolean, $false)} 
# add choice columns 
for($count = 1; $count -le $ChoiceColumnsToCreate; $count++) 
{$list.Fields.Add(“Choice$count”, [microsoft.sharepoint.SPFieldType]::Choice, $false)} 
# add single line text columns 
for($count = 1; $count -le $SingleLineTextColumnsToCreate; $count++) 
{$list.Fields.Add(“SingleLineText$count”, [microsoft.sharepoint.SPFieldType]::Text, $false)} 
# add multi line text columns 
for($count = 1; $count -le $MultiLineTextColumnsToCreate; $count++) 
{$list.Fields.Add(“MultiLineText$count”, [microsoft.sharepoint.SPFieldType]::Note, $false)} 
# add date time columns 
for($count = 1; $count -le $DateTimeColumnsToCreate; $count++) 
{$list.Fields.Add(“DateTime$count”, [microsoft.sharepoint.SPFieldType]::DateTime, $false)} 
# add currency columns 
for($count = 1; $count -le $CurrencyColumnsToCreate; $count++) 
{$list.Fields.Add(“Currency$count”, [microsoft.sharepoint.SPFieldType]::Currency, $false)} 
# add lookup columns 
for($count = 1; $count -le $LookupColumnsToCreate; $count++) 
{$list.Fields.AddLookup(“Lookup$count”, $lookuplist.ID, $false)} 
# populate list with items 
foreach($x in 1..$ItemsToCreate) 
{ 
    $item = $list.AddItem() 
    $item[“Title”] = [char](97 + (Get-Random -Maximum 25)) 
    # assign values integer columns 
    for($count = 1; $count -le $IntColumnsToCreate; $count++) 
    {$item[“Int$count”] = Get-Random -Minimum -100000 -Maximum 100000} 
    # assign values boolean columns 
    for($count = 1; $count -le $BoolColumnsToCreate; $count++) 
    {$item[“Bool$count”] = [bool](Get-Random -Minimum 0 -Maximum 2)} 
    # assign values choice columns 
    # not implemented 
    # assign values single line text columns 
    for($count = 1; $count -le $SingleLineTextColumnsToCreate; $count++) 
    {$item[“SingleLineText$count”] = “lorem ipsum “ * (Get-Random -Minimum 1 -Maximum 8)} 
    # assign values multi line text columns 
    for($count = 1; $count -le $MultiLineTextColumnsToCreate; $count++) 
    {$item[“MultiLineText$count”] = “lorem ipsum “ * (Get-Random -Minimum 1 -Maximum 200)} 
    # assign values date time columns 
    for($count = 1; $count -le $DateTimeColumnsToCreate; $count++) 
    {$item[“DateTime$count”] = (Get-Date).AddDays((Get-Random -Minimum -1000 -Maximum 1000))} 
    # assign values currency columns 
    for($count = 1; $count -le $CurrencyColumnsToCreate; $count++) 
    {$item[“Currency$count”] = [system.decimal](Get-Random -Minimum -10000 -Maximum 10000)} 
    # assign values lookup columns 
    # not implemented 
    
    $item.Update() 
} 
Stop-SPAssignment -Global 

   Here is a screenshot of the list that this script creates.

CreateGiantSPList1

 

Conclusion

   Lisa was very happy with the results of this script and I learned a bit about generating SharePoint columns and random data.  This script is not very polished but it gets the job done.  If you have a need to generate a lot of SharePoint list columns or random data hopefully this script will be helpful.  If you have any feedback on it feel free to leave a comment or email me.

 

      -Frog Out

Goals for 2013

   Another new year, another set of goals.  Over the past few years I have set goals (2010, 2011, 2012) and done retrospectives at the end of the year (2010, 2011, 2012).  This year already has a number of big changes in the pipeline but I’m trying to introduce a few other changes with the goals I am setting.

 

Professional

  • Work local more – If you’ve been reading my blog the past year and a half you’ve probably heard me write that my new role as Premier Field Engineer with Microsoft has me traveling quite a bit (as evidenced by me hitting platinum status on Delta and Marriott last year).  There are some changes at work which have me traveling less but I’m still away from home at least a few weeks a month.  I am working towards doing more local and remote (from home) work.  With my upcoming wedding I would like to be home more as spending time with friends and family is an important part of my life.
  • Mentor / Mentee – Over the past 5-10 years I have had a few official and unofficial mentors whether through work, social groups, or otherwise.  I currently have two mentors through work.  I would like to build upon those relationships and also look for opportunities of becoming a mentor (not necessarily through work) for someone else.  The latter may not happen this year but I would like to start laying the foundation for that.
  • Blogging – I plan to continue blogging as I have been for the past almost 4 years now.  I intend to blog at least 20 posts throughout the year.

Personal

  • Get married – This Oct my fiancé Sarah and I will be tying the knot.  I’m very happy and excited to be getting married as it opens a completely new chapter of our lives together.
  • Stay Fit – Yes I included this last year and if you read my last retrospective I did lose 10+ pounds last year.  With the upcoming wedding in Oct Sarah and I both have goals to lose a little more for this year.  For me I am aiming to lose another 13 pounds by Oct and 15 total by the end of year.  I’m planning to join a gym and have already bought a FitBit to track steps / sleep patterns / weight.  I’m also continuing to eat healthier (most times) and practice portion control.
  • Read Books – Purchasing a Kindle over a year ago got me reading more frequently.  Now with my Surface RT I’m able to read easily on a device I travel with all the time.  Win-win situation.  This year I plan to read at least 5 books for the year.  I was very happy to find my local library lends e-books and I’ve already started with Daemon by Daniel Suarez which is hard to put down.  I’m also planning to track my books and post those as recommendations later similar to others (Todd Klindt’s book recommendations).

 

Conclusion

   I like the idea of managing things in groups of threes (which I picked up from 30 Days of Results) hence 3 personal and 3 work goals for the year.  There are a few smaller goals I have that are targeted for shorter term timeframes as well.  This year is looking to be another good year and I’m eager to tackle the challenges and opportunities that come.

   On a side note, when I first started posting my goals in 2010 I called out a close friend Sean McDonough to also post his goals online.  As it turns out this year Sean did finally post his goals and join the bandwagon.  Thanks for carrying on the torch Sean.  Since Sean posted his I’m going to call out a few of my coworkers in hopes they will also post their goals for the year: Josh Gavant and Ashley McGlone.

 

      -Frog Out

Goals for 2012 Retrospective

   Now in my third year of an annual tradition I have set goals (2010, 2011, 2012) and gone through retrospectives (2010, 2011).  So here goes a recap on the year that was 2012 and how I stacked up against my goals.

Year in Review

   Similar to 2011, 2012 was a big year for me.  In 2012 I proposed to my now fiancé Sarah (pictures), I completed my first year with Microsoft as a Premier Field Engineer, my middle brother got married, my oldest brother and his wife had a baby, Sarah’s sister and husband found out they were pregnant, my mom had a total hip surgery, I wrote two chapters for a new book (more details about this once it is closer to publishing), and I helped out with various conferences and projects.  As you may have noticed a number of these items are family related which has been a bigger focus for me this past year.

Professional Goals

  • Blog – I set a goal to blog twice a month but did not meet that goal.  I only had 18 blog posts throughout 2012 but they were fairly well spaced out through the months.  It has been more difficult to keep up with blogging given my travel schedule with work, increased family commitment, book writing I was involved with, and other factors.  I still plan to continue blogging as long as I have content / ideas to write about.
  • Speaking – I had a goal of speaking at 3 user groups and / or conferences in 2012.  I spoke at 4 events in 2012 (PowerShell Saturday Columbus, SharePoint Cincy, SharePoint Saturday Twin Cities, and SPTechCon Boston).  As intended I reduced the number of events I would speak at compared to 2011.  Due to frequently traveling for work it is difficult to spend a weekend or part of a week away when my time at home with friends and family is already limited.  Going forward my speaking commitments will greatly depend on family commitments and travel schedules.
  • Open Source – I intended to work on two separate open source projects.  I did not complete any additional work on SavePSToSP for a variety of reasons but mostly because my time was better spent on other projects and goals.  I have worked on the SharePoint diagnostics project with my coworker Eric Harlan but have not gotten to publishing the project or blogging on it.  This may see the light of day in 2013 but we shall see.
  • Volunteering – I assisted with planning Stir Trek again in 2012 as well as a new conference PowerShell Saturday which we held the first ever in Columbus.  I’ve also been involved in the local Buckeye SharePoint User Group (BuckeyeSPUG).

Personal Goals

  • Stay Fit – My goal for 2012 was to lose 15 pounds.  I ended up losing a total of 16 pounds by November but with the holiday food fests ended the year 11 pounds lighter than I started the year.  With our upcoming wedding this year Sarah and I have continued goals to lose a few more pounds.
  • Read Books – I set out to read at least 2 books.  With the Song of Ice and Fire (Game of Thrones series for you TV-only folks) I read the first 4 books.  Additionally I read a couple marriage prep books.  I wish I had kept a list of when I read certain books as they are a blur looking back now.  I like what Todd Klindt did and post the books he read throughout the year.  I may do the same for next year.

 

Conclusion

   Overall I met (or came close to) many of my goals for 2012.  Aside from these goals I also set a midyear goal for improving my personal productivity which I blogged about.  So far I have seen an increase in my personal productivity mainly by how many things I have cut out of my life that are / were unnecessary.

   Despite reading some articles about not setting goals I still find the process very worthwhile and will continue into 2013.  I even found a series called 30 Days of Getting Results which very much builds on the idea of setting 3 attainable goals for the day / week / month / year and then doing a retrospective at the end of the respective time period.  I have not fully adopted this system but have incorporated a number of the concepts into my daily life.

   I hope 2012 was good to you (as it was to me) and you are refreshed coming into 2013 ready to tackle new challenges and continue growing.

 

      -Frog Out

Accessing Host Machine Files From Hyper-V Guest Virtual Machine

   Similar to a previous post I wrote on How To Configure Remote Desktop to Hyper-V Guest Virtual Machine I commonly get questions for “how do I access Hyper-V host machine files from inside a guest virtual machine?”  The solution I use is a combination of software and configuration but there are many other options as well.

Problem

   When connecting to a Hyper-V guest virtual machine you cannot easily transfer files into or out of the virtual machine.

Solution

   For almost all of my remote desktop needs I use a program called mRemote (Multi Remote).  The original mRemote developer has joined a new company and folded mRemote into a new product but you can still download a stable build of mRemote from CNET here.  There is also a forked version called mRemoteNG (Multi Remote Next Generation) that I have not tried out personally.

   One of the nice features of mRemote is that you can configure an RDP connection to map local host drives so that they are available inside the RDP session.  By RDP’ing to a virtual machine with the host drives mapped you now have access to transfer files into or out of the virtual machine.  Open mRemote and configure your RDP session as normal.  Under the Redirect heading change the Disk Drives setting to Yes as in the screenshot below.

AccessHyperVHostFilesInGuest1

   As an added bonus mRemote also allows you to store credentials to use for an RDP connection.  I find this very helpful to avoid having to retype credentials every time I bring up a virtual machine, especially with how many fake demo domains I deal with in various virtual machine farms I build out.

   When you connect to this RDP session you may be prompted to allow the remote desktop connection to access local resources (I do on Windows 8 at least, I didn’t previously on Windows 7).  Make sure to allow the Drives resources to be accessed.  If anyone knows a way around this prompt I would be interested to hear as I have not found any yet.

AccessHyperVHostFilesInGuest2

   After connecting to the virtual machine you now have access to the host machine drives.  See the example below.

AccessHyperVHostFilesInGuest3

 

Conclusion

   As I stated earlier there are a number of ways you can solve the issue of not being able to access host machine files inside a Hyper-V guest virtual machine.  My solution is a combination of RDP session with the mRemote application and configuring the RDP connection to map local host drives into the RDP session.  If you have other solutions or tips feel free to leave them in the comments below.

 

      -Frog Out

 

Links

How To Configure Remote Desktop to Hyper-V Guest Virtual Machine

https://briantjackett.com/archive/2010/06/06/how-to-configure-remote-desktop-to-hyper-v-guest-virtual-machines.aspx

 

mRemote download from CNET

http://download.cnet.com/mRemote/3000-2648_4-10793919.html

 

mRemoteNG project

http://www.mremoteng.org/