How To Analyze SharePoint 2013 Migration Assessment Tool (SMAT) Data in Power BI – Part 1

   In this set of posts I will walk through a process for consuming SMAT output files into a Power BI report.

  • Part 1 – importing the data and then querying / modeling the data
  • Part 2 (not yet published) – designing the Power BI Report (and also provide sample files to use)

   In a previous post I talked about the SharePoint Migration Assessment Tool for SharePoint 2013 being released to web.  Please read that post for context on working with SMAT.

Disclaimer: I am by no means a Power BI expert.  In fact this report is the first time I’ve spent any significant time developing a report.  Most of the technical credit goes to my coworkers Ken Kilty and Gene Livshin.

Note: Power BI Desktop is being updated regularly.  For the purposes of this article I am using the January 2017 build (version 2.42.4611.701 64-bit).

Overview

   The SMAT tool outputs a summary CSV file along with 1 detailed CSV per risk being analyzed.  Reading and interpreting hundred or thousands or tens of thousands of rows in a CSV file is not optimal.  Instead it will be helpful to model and report on that data in an easier to consume fashion.  At a high level our process will involve the following:

  1. Import Data – Import SMAT output files into Power BI
  2. Query Data – Create multiple queries on SMAT output files to filter, aggregate, and enhance into different perspectives
  3. Design Reports – Design dashboard pages to consume the query outputs

 

Import Data

   Importing data into Power BI is a fairly simple process.  After launching Power BI Desktop click the Get Data link from the welcome screen (or click the Get Data button in top ribbon menu).

SMATPowerBIReport1

 

   Select the CSV data type.

SMATPowerBIReport2

 

   Navigate to the folder where the SMAT output files are located.  In this walkthrough use the SiteAssessmentReport.csv file.

SMATPowerBIReport3

 

   A preview of the data should appear.  I have run a PowerShell script to clean up the data from any customer sensitive information (ex. server names, database names, URLs, users, etc.)

Note: At a later date I may blog out that PowerShell script and link it back to this article.  If this is data scrubbing script is of interest please let me know in comments or contact me to prioritize this.

SMATPowerBIReport4

 

   If everything was successful a list of fields should appear on the right hand of the screen.  These fields will correspond to the comma separated fields in the CSV file.

SMATPowerBIReport5

 

Query data

   As mentioned earlier the raw CSV data by itself is not easily readable to gain insights about the farm.  The data needs to be filtered and queried differently to support the report that will be built.  I can not stress enough that having a set of questions you are trying to answer is extremely important.  It is not sufficient to start designing reports without some initial planning.  Think of this step as modeling the data in different ways to get different perspectives on the data.  Initially my personal goals with this process were to identify the following items:

  • Content database statistics
    • Number of sites per database?
    • Size (storage) of databases?
  • Web application statistics
    • Number of site collections and subsites?
    • Size (storage) of site collections?
  • Web application risks
    • Breakdown view of risks per web application?
    • Slicer by web application and risk type
  • Site collection risks
    • Number of risks per site collection?
    • Slicer by risk type

   Having these goals / questions in mind I diagramed how to split the data into different queries that could then answer those questions.

SMATPowerBIReport6

 

SiteAssessmentReport Query

   Power BI has a nice set of features that allows taking a set of data, making some changes, and then creating a reference (shallow copy) or duplicate (deep copy) that altered set of data.  As such I loaded the CSV data and then referenced that data into 3 separate queries, 1 of which was further split into multiple queries for risk data by web application and by site collection.  The following are example steps to accomplish this.

   After the data has been loaded from previous step click on Edit Queries from the ribbon menu.

SMATPowerBIReport7

 

   Notice that the first query for the loaded CSV file includes a few modifications to promote the header row and change datatype of columns that should be numbers.  Leave those steps in place.

SMATPowerBIReport8

 

   Create three references from the source data load query (SiteAssessmentSummary) and rename the referenced queries appropriately.  In this example I named them RiskFilteredSource, DatabaseMetrics, and WebApplicationMetrics (respectively the 3 boxes in the 2nd row on my hierarchy diagram above).

SMATPowerBIReport9

SMATPowerBIReport10

 

RiskFilteredSource Query

   The next step is to start adding modifications to the queries to shape the data into the different results which will be reported on.  Start with the RiskFilteredSource.  Only the SiteURL and individual risk columns (Alerts, Apps, etc.) are necessary.  Rather than removing the columns that aren’t needed it is recommended to select the columns that are needed (see reference for more information on why).  Click Choose Columns from the ribbon menu.  On the following dialog box select the columns for SiteUrl  and all of the individual risk columns (Alerts, Apps, etc.) ending with WorkflowRunning2013.

Note: the set of individual risks in the CSV will be dependent on which ones were enabled when the SMAT tool was run.  This examples uses the out of the box settings as of SMAT v1.1 release.

SMATPowerBIReport11

 

   The query result should look like the following with only the columns selected still showing.  That is all for the RiskFilteredSource query.

SMATPowerBIReport12

 

DatabaseMetrics Query

   Next will be modifications to the DatabaseMetrics query.  Similar to the previous query start by selecting only the columns relating to databases (e.g. ContentDbName, ContentDbServerName, and ContentDBSizeInMB) by using the Select Columns button from the ribbon menu.

SMATPowerBIReport13

 

   There will likely be duplicate rows in the result because each row in the original CSV corresponds to a single site collection and each content database can contain 1 or more site collections.  In order to clean up the duplicate rows select all three columns and then (with columns still selected) click Remove Rows –> Remove Duplicates from the ribbon menu.

SMATPowerBIReport14

 

   The DatabaseMetrics query result will contain only a single record per content databases with the associated server name and database size.

SMATPowerBIReport15

 

WebApplicationMetrics Query

   The last of the original four queries, WebApplicationMetrics, will take extra steps to shape the data as intended.  The SiteUrl column is not in the format needed since it contains full site collection URLs.  For the purposes of this example assume that the authority (ex. subdomain.domain.com) portion of URL represents an individual web application.  Continuing with that assumption the SiteUrl column can be manipulated to show data per web application.  Click Choose Columns from the ribbon menu and select only the columns for SiteUrl, SiteCollectionSizeInMB, SiteCollectionCount, and SubWebCount.

Note: if the farm contains host named site collections (HNSCs) then the assumption regarding “URL authority = web application” may be invalid.

SMATPowerBIReport16

 

   Next remove the scheme (ex. http:// or https:// most likely) from the SiteUrl column by clicking the Replace Values button from the ribbon menu.  For Replace With specify an empty string (“”).  In my example all sites are using https:// but a farm may have a mix of both http:// and https:// so an additional Replace Values step may be needed.  I’m not aware of a way to combine both into one operation but if a reader knows please do share in the comments.

SMATPowerBIReport17

 

   In order to separate out the authority portion of URL from the rest of the site collection (e.g. the path) click the Split Column –> By Delimiter button from the ribbon menu.  Choose a custom delimiter of forward slash (/) and choose to only split at the left most delimiter.

SMATPowerBIReport18

SMATPowerBIReport19

 

   To make it easier to read these newly split columns click “Rename column” from the ribbon menu and rename the first column to SiteUrlDomain and the second column to SiteUrlPath.  Alternately the Rename column functionality can be found by right-clicking the column and choosing Rename from the list of actions.

SMATPowerBIReport20

 

   The final modification for the WebApplicationMetrics query will be to group the data by multiple aggregations.  Select the SiteUrlDomain column and click Group By in the ribbon menu.  Make sure that SiteUrlDomain is selected for the top Group By box.  Proceed to add additional aggregations for three columns:

New Column Name Operation Column
SiteCollectionSizeInMB Sum SiteSizeInMB
SiteCollectionCount Count Rows  
SubWebsSum Sum NumOfWebs

 

SMATPowerBIReport21

 

   The results for the WebApplicationMetrics query should look similar to the following.

SMATPowerBIReport22

 

RiskByWebApp Query

   Referring back to the original planning hierarchy there are two final queries to be created: “Risks by Web Application” and “Risks by Site Collection”.  These queries are children from the “Filter for Risk Related Columns”.  This translates to creating 2 references from RiskFilteredSource query.  Rename these referenced queries to RiskByWebApp and RiskBySiteCollection.

SMATPowerBIReport23

 

   Starting with RiskByWebApp perform the similar steps used with WebApplicationMetrics.

  1. Replace the https:// or http:// scheme
  2. Split SiteUrl column on leftmost “/” delimiter

   Since this query is focused on web application based information the “path” portion of URL (ex. “/sites/HRMarketing”) is not needed.  Click Choose Columns and select all columns except “SiteUrl.2”.

SMATPowerBIReport24

 

   After the columns have been chosen rename the “SiteUrl.1” column to “SiteUrlDomain”.  The data should look similar to the following at this point.

SMATPowerBIReport25

 

   Now is where some of the magic happens.  Credit for this specifically goes to Ken Kilty and Gene Livshin.  I want to aggregate the individual risk counts per web application while also maintaining the individual risk name.  Welcome to Unpivot Columns (<insert fireworks and fanfare>).  Think of the unpivot columns operation as transpose for a key-value pairing.  Select all of the columns for the individual risks (Alerts, Apps, etc.) but be sure to not select SiteUrlDomain.  Then click Unpivot Columns.

SMATPowerBIReport26

 

   The resulting query will have SiteUrlDomain, Attribute, and Value for columns.  Rename Attribute and Value to RiskName and RiskCount respectively.

SMATPowerBIReport27

 

   Notice that RiskName values will repeat for if there are multiple site collections in that web application.  Click Group By from the ribbon menu.  Specify SiteUrlDomain and RiskName for the top grouping dropdowns (add a new grouping for RiskName since the default is one grouping.)  For the New Column Name, Operation, and Column specify RiskCountAggregated, Sum, and RiskCount respectively.

SMATPowerBIReport28

 

   The RiskByWebApp query result is finally in a usable format now.  See the following for an example of the output.

SMATPowerBIReport29

 

RiskBySiteCollection Query

   The RiskBySiteCollection query will mirror the RiskByWebApp query in every step except there will not be a split column or the clean up of columns immediately afterwards.  The order of steps will be as follows:

  • Source
  • Replaced value – remove scheme from SiteUrl
  • Unpivoted Columns – select all columns except for SiteUrl for the unpivot
  • Rename Columns – Attribute –> RiskName, Value –> RiskCount
  • Grouped Rows – group by SiteUrl and RiskName, sum RiskCount into RiskCountAggregate.

 

   The RiskBySiteCollection query should end up looking like the following:

SMATPowerBIReport30

 

Conclusion

   Quite a bit of work goes into transforming and modeling the initial CSV data into more usable formats.  Hopefully this post has been helpful in illustrating steps that can be taken to plan out the end goal as well as model the queries of the SMAT output data.  In Part 2 of this post I will walk through creating a number of Power BI report pages that leverage what has been created thus far.  I’ll also attempt to share out sample source CSV data and a Power BI PBIX file but no guarantees.  Please share any feedback or questions in the comments.

 

      -Frog Out

SharePoint 2013 Migration Assessment Tool Released To Web

   Microsoft quietly released the Release to Web (RTW) version of the SharePoint Migration Assessment Tool (SMAT) for SharePoint 2013 on Jan 20, 2017.  This is an update from the Release Candidate (RC) version that was released in Fall 2016.  I haven’t seen any announcements regarding this upgrade so I wanted to share here.

 

Background

  For those who are not familiar with SMAT it is a command line tool that will check a SharePoint farm for risks migrating to SharePoint Online.  Aside from that primary purpose it is also useful for an overview audit of the farm (name and size of content databases, name and size of site collections, site collection admins and owners, and more.   There is a backing configuration file (scandef.json) which can enable or disable various checks to be scanned against a SharePoint farm.  At the time of writing there is only a version compatible with SharePoint 2013 but there is potential for a SharePoint 2010 and / or 2016 compatible version in the future.

   In terms of the checks scanned for they include:

  • Large lists
  • Checked out files
  • Full trust solutions
  • Non-default master pages
  • Unsupported web templates
  • … and more

 

   SMAT will need to run on a server within the SharePoint farm and be run as the farm administrator.  Run time will vary depending on the size and configuration of the farm.  In a tiny lab farm I had created it took minutes to run but in sizable lab farm (multiple TBs of data and tens of thousands of site collections) it took over 24 hrs.  The output from the tool will be 1 summary CSV along with 1 detail CSV per check scanned (~30).

 

What’s New

  Comparing the documentation from the RC version to the RTW version I am seeing a few new risks that are scanned but I didn’t see them output in the CSV files.  I may need to be make changes to the config file in order to include the new checks.  Other than that the execution of the tool and general process is the same as the RC version.

 

Next Steps

   So what can you do with these CSV files?  By themselves the output CSV files can be difficult to read to get a good picture of the overall migration risks of a farm.  One option is to use these output files in conjunction with the Microsoft FastTrack Center and their SharePoint 2013 to SharePoint Online Migration Offer.  See this link for details on that offer and more.

   Another option is to analyze the CSV files on your own with tools such as Excel, SQL Server, or Power BI.  Personally I am very new to Power BI but with a little research and consulting with fellow PFEs I was able to generate a useful report that aggregated the risk data and filtered it into very usable visualizations.  I will write a follow up post about the process I followed to model and visualize these output files.

 

Conclusion

   If you are are hosting a SharePoint 2013 farm and looking to migrate to SharePoint Online or simply want to audit your farm for risks and general data I encourage you to take a look at the SharePoint Migration Assessment Tool.  Look for my next blog post where I’ll dig into interpreting the data with Power BI to find useful insights and more.

 

      -Frog Out

Retrospective for 2016

2016 was a busy year for me, especially from a personal perspective.  Looking back here are some of the things that went on during 2016 as well as things I’m looking forward to in 2017.

 

Retrospective

2016 was filled with many events both personal and work related.  The biggest event of all was the birth of our son Samuel.  I am very grateful that my employer Microsoft changed their already generous infant care leave policy from 1 month of paid time off to now 3 months.  That is almost unheard of in any industry or organization.  During that time off (two separate chunks) I got to spend lots of quality time with my family taking outings to the zoo, making meals together (my most cherished memory), reading books, and many other bonding activities.  After seeing how quickly Clara is growing up I know this time for both our kids is going to go fast so I’m extremely thankful to have had this extra time with them.

Aside from my paternity leave I also got back to speaking at conferences including SharePoint Saturday Twin Cities, Dog Food Conference, SharePoint Saturday Detroit, and SPTechCon Austin as well as a new local user group Central Ohio Azure User Group.  Many of the topics I covered in these presentations go outside SharePoint into Azure, PowerApps, and Microsoft Flow.  I’m a fan of the numerous possibilities for integrating all of these products / services together to provide solutions for my customers (and myself on some side projects.)  Now that PowerApps and Microsoft Flow are generally available (GA) I’m looking forward to diving into those two in more depth in 2017.

One other major project throughout 2016 was moving my blog to a new hosting provider.  As it turns out I ended up moving twice, first from GeeksWithBlogs.net to self-hosted WordPress on Azure and then again from self-hosted WordPress to the official hosted WordPress.  It was quite a learning experience exporting data, cleaning up the exports, configuring themes, and all of the other work to get WordPress to a point I was satisfied with.  As I am not familiar enough with the hosting of WordPress and was getting many 500 errors after I “tweaked” the databases I thought it best to migrate to the official hosted platform.  So far it has been working well but I am still in the process of cleaning up a few items (mostly CSS or styling) that didn’t migrate 100%.  If you find any issues please contact me or leave a comment.

One a separate note, last year I started using Trello for tracking weekly progress on personal productivity items.  After a few months trying that out I read a few articles that turned me off of that practice.  Eventually it ended up that I was searching for things to put on my weekly lists and the process became just that: a process.  I still keep track of yearly accomplishments for annual review at work.

Sharing Interests

Similar to years past I read a number of books and listened to a number of podcasts.  You can read about 2015 book and podcast recommendations here.  Here are some additional ones from 2016.

Books

I read quite a few Star Wars books at the end of 2016.  I’ve always been a fan of Star Wars but I saw our local library had a number of books in the Journey to Star Wars: The Force Awakens series.  I decided to read one and got hooked.  I find it very interesting to explore the “new” expanded universe for Star Wars especially backstory on characters, motivations for various factions, and more.

Games

I haven’t had a real gaming PC in quite a few years but this past year I have gotten into PC gaming a little bit again.  One of the highlights was an indie game called Stardew Valley.  For anyone familiar with farming RPGs (yes it is a niche but one that I was heavily into years ago) like Harvest Moon or Rune Factory this is a love letter to that style of game.  I’ve already invested a few dozens hours into this game and have not even finished year 2.  I like the way farming, mining, fishing, foraging, social interactions, and more are all interconnected yet you are free to go at your own leisurely pace.

 

Conclusion

2016 was a great year, especially welcoming our son Samuel into our family.  I’m thankful for the extra time off work to spend with my family and looking forward to 2017.  As always if you have any recommendations on books, podcasts, tech, or other interesting things feel free to share.

 

-Frog Out

Slides and Demo Scripts from SharePoint Saturday Twin Cities 2016

A big thank you out to the organizers, attendees, and sponsors from SharePoint Saturday Twin Cities.  It has been a number of years since I was last at this event and great to see it alive and well.  I had a great time getting to talk with attendees and fellow speakers as well as presenting two sessions.  Slides, code, and scripts from my sessions are below.  If you have any feedback or follow up questions please leave a comment below.

 

On a personal note I was completely flattered when Jeff Teper (“father of SharePoint” himself) and James Phillips (CVP for Business Apps at Microsoft) tweeted back to me during my prep time this morning.  Great to see the Corporate Vice President or SharePoint and OneDrive engaged and supporting community events.  A sign of good things to come from the SharePoint and PowerApps leadership.

 

image

 

PowerApps and Microsoft Flow Intro for Developers

GitHub link to demo project files

https://github.com/BrianTJackett/BTJ.PowerApps.AzureDBSample

Slides

 

Running Your Azure Dev / Test VMs for Cheap

PowerShell script for creating new Azure RM VM

Slides

 

-Frog Out

Slides and Demo Scripts from Dog Food Conference 2016

   A big thank you out to the organizers, attendees, and sponsors from Dog Food Conference.  I had a great time getting to talk with attendees and fellow speakers as well as presenting two sessions.  Slides, code, and scripts from my sessions are below.  If you have any feedback or follow up questions please leave a comment below.

 

PowerApps and Microsoft Flow Intro for Developers

GitHub link to demo project files

https://github.com/BrianTJackett/BTJ.PowerApps.AzureDBSample

Slides

 

Running Your Azure Dev / Test VMs for Cheap

PowerShell script for creating new Azure RM VM

Slides

 

      -Frog Out

Slides and Demos from SharePoint Saturday Detroit 2016

   After a few years away I was able to present at SharePoint Saturday Detroit 2016.  A big thank you to all of the attendees, sponsors, organizers, and other speakers.  Below are my slides and a link to a GitHub repo with the demo projects that I used for the final demo.  This includes the Azure SQL DB project, API app, and Web App.  If you have any feedback or questions feel free to leave a comment below.

GitHub link to demo project files

https://github.com/BrianTJackett/BTJ.PowerApps.AzureDBSample

 

      -Frog Out

Upcoming Speaking Events September 2016

   After many months away from blogging and speaking (baby #2 was born recently) I’m speaking at a few conferences in the coming months.  If you are attending any of these conferences please stop by and say hi.  Looking forward to returning to each of these events.  Here are the sessions and abstracts.

 

Sessions

SharePoint Saturday Detroit (Sept 17th, 2016)

Title: PowerApps and Microsoft Flow for the SharePoint Developer

Abstract: PowerApps and Microsoft Flow are enterprise services (currently in preview) that allows power users and developers to build scalable applications that connect with numerous consumer and enterprise sources using PowerPoint and Excel-like tools. In this session we will overview the integration points for PowerApps and Microsoft Flow with various sources such as OneDrive, Twitter, Azure, and more. We will also talk about the developer story for integrating with custom APIs and on-prem data sources such as SQL Server and SharePoint. Lastly we will demo a number of scenarios to show how easily you can create and consume apps across Windows, iOS, Android, and web.

 

DogFood Con (Oct 5-6, 2016)

Title: PowerApps and Microsoft Flow Intro for Developers

Abstract: PowerApps and Microsoft Flow are enterprise services (currently in preview) that allows power users and developers to build scalable applications that connect with numerous consumer and enterprise sources using PowerPoint and Excel-like tools.  In this session we will overview the integration points for PowerApps and Microsoft Flow with various sources such as OneDrive, Twitter, Azure, and more.  We will also talk about the developer story for integrating with custom APIs and on-prem data sources such as SQL Server and SharePoint.  Lastly we will demo a number of scenarios to show how easily you can create and consume apps across Windows, iOS, Android, and web.

 

Title: Running Your Dev / Test VMs in Azure for Cheap

Abstract: With an MSDN subscription you can run your dev / test environment in Azure IaaS for less than the cost of a cup of coffee each day. In this session we will overview the basics of Azure IaaS (Infrastructure as a Service), the pieces you will use to be successful deploying SharePoint in Azure (including the new Azure Resource Manager templates), and how to use resources as efficiently as possible to reduce your costs and boost your farm performance. This session is targeted to SharePoint developers and administrators. Prior knowledge of Azure is helpful but not a requirement.

 

SharePoint Saturday Twin Cities (Oct 29, 2016)

Title: PowerApps and Microsoft Flow for Developers

Abstract: PowerApps and Microsoft Flow are enterprise services (currently in preview) that allows power users and developers to build scalable applications that connect with numerous consumer and enterprise sources using PowerPoint and Excel-like tools.  In this session we will overview the integration points for PowerApps and Microsoft Flow with various sources such as OneDrive, Twitter, Azure, and more.  We will also talk about the developer story for integrating with custom APIs and on-prem data sources such as SQL Server and SharePoint.  Lastly we will demo a number of scenarios to show how easily you can create and consume apps across Windows, iOS, Android, and web.​

 

      -Frog Out