<Update 2017-02-28> Updated PBIX file with “Top N site collections by risk page” based on feedback from Microsoft peers. Also minor updates to documentation for report creation. </Update>
In this set of posts I will walk through a process for consuming SMAT output files into a Power BI report. This post will give a high level overview of the report pages to create and Power BI visuals used. In order to limit the length of this post (I was at 60+ screenshots before even finishing) I am providing a separate more in-depth Word document with the steps to create the report pages contained.
- Part 1 – importing the data and then querying / modeling the data
- Part 2 – designing the Power BI Report
In a previous post I talked about the SharePoint Migration Assessment Tool for SharePoint 2013 being released to web. Please read that post for context on working with SMAT in general.
Disclaimer: I am by no means a Power BI expert. In fact this report is the first time I’ve spent any significant time developing a report. Most of the technical credit goes to my coworkers Ken Kilty and Gene Livshin.
Note: Power BI Desktop is being updated regularly. For the purposes of this article I am using the January 2017 build (version 2.42.4611.701 64-bit).
Sample report and data files (updated 2017-02-28)
In-depth Power BI report creation documentation (document in progress, not finalized).
For the PBIX file be sure to update the source of data file via the Edit Query screen on the SiteAssessmentReport query as follows:
The SMAT tool outputs a summary CSV file along with 1 detailed CSV per risk being analyzed. Reading and interpreting hundred or thousands or tens of thousands of rows in a CSV file is not optimal. Instead it will be helpful to model and report on that data in an easier to consume fashion. At a high level our process will involve the following:
- Import Data [part 1] – Import SMAT output files into Power BI
- Query Data [part 1] – Create multiple queries on SMAT output files to filter, aggregate, and enhance into different perspectives
- Design Reports [part 2] – Design dashboard pages to consume the query outputs
Continuing from the queries that were modeled in part 1 now it is time to create reports to visualize the SMAT data to tell a story. As mentioned in the previous post I had a number of questions I wanted to answer.
- Content database statistics
- Number of sites per database?
- Size (storage) of databases?
- Web application statistics
- Number of site collections and subsites?
- Size (storage) of site collections?
- Web application risks
- Breakdown view of risks per web application?
- Slicer by web application and risk type
- Site collection risks
- Number of risks per site collection?
- Slicer by risk type
These questions can be answered across a number of different pages that focus on each. Start by creating 5 total pages with the names such as the following:
- Content DB overview
- Site collection overview
- Risk overview
- Risks by web application
- Risks by site collection
**Mea Culpa – PLEASE READ**
Before moving into the Power BI report I have to issue a mea culpa. As I reviewed the report pages I found that a few of my queries required additional columns or small modifications in order to improve usability or readability of the reports. These changes involved new calculated columns (convert MB to GB) and a conditional column to group sites by the number of risks relative to each other (small, medium, large, etc.). Please reference the sample files to see these changes.
Content DB overview page
The content DB overview page will give an overview size of the content database (storage used) and can be filtered by individual server if multiple database servers are used. Notice that databases have been re-ordered by their size rather than default sorting by name.
Site collection overview page
The site collection overview page will give an overview of the number site collections in a given web application compared to the size (storage) of site collections / web applications. Notice that a pattern is immediately visible when viewing this page. The majority of site collections are contained within sub0.domain.com while sub1.domain.com contains the bulk of remainder. Inversely sub1.domain.com is where the bulk of the storage is used while sub0.domain.com only uses a fraction of the storage.
Risk overview page
The Risk overview page will give a high level view of risks per site collection and by risk type. The histogram visual used (more on this below) filters the groups of site collections by ranges of how many risks were identified.
Add custom histogram visualization
The Risk overview page makes use of a custom visual called Histogram from the Power BI gallery. Again thanks to Ken Kilty for this advice. Knowing the number of risks found per site could be useful, but more likely it would be better to see trends and distributions of risks per grouping of sites and look for anomalies. How does one group sites together? Put them into “buckets” based on how many risks the sites have. Ex. Column one is count of sites with 0-100 risks, column two is 101-200 risks, and so on. In the reporting world a histogram visual can provide this for us.
Navigate to “https://app.powerbi.com/visuals/” and search for “Histogram”. There is a custom visual provided by Microsoft (v0.3.1 as of the time of this writing). Download that visual file. Back in Power BI on the visuals gallery click the ellipses (…) in bottom right and then select “Import a custom visual”. Navigate to the location where you downloaded the histogram custom visual and select it.
Risks by web application page
The Risks by web application report page will show a breakdown of risks per web application and then within the web application and also a breakdown of the risk types. Filter by individual web application or risk name. Note that there is a report level filter for the supporting query so that any risk name with no occurrences is hidden.
Risks by site collection page
The Risks by site collection page provides an overview of risk types, number of risks identified for selected risk type, and number of site collections that have the selected risks. Filter by risk name to find the number of identified risks and sites where that risk was identified. Note that there is a report level filter for the supporting query so that any risk name with no occurrences is hidden.
Top N site collections by risk page
The Top N site collections by risk page provides an overview of the top N (in my example N = 25, more on this below) site collections with the highest number of risks. Filter by risk name to find identified risks across these top 25 sites or filter by site collection to find risks on selected site collection. Note that there is a report level filter for the supporting query so that any risk name with no occurrences is hidden.
Add query parameter for number of site collections
One of the features of the Top N site collections by risk page is to sort site collections by total number of risks and only keep the top N records. As such it is preferable to use a query parameter within Power BI desktop to allow for a variable number of site collections to be shown. Power BI Desktop introduced query parameters with the April 2016 update. Query parameters allow a report creator to define what is essentially a variable to then be used in data source dialogs (primarily filtering operations). See below for an example of defining query parameter and usage in filter for top N rows.
I will be one of the first ones to admit that at the start of this project I knew next to nothing about proper dashboard creation and ways to model data in different ways. Speaking as a general developer and not as a Microsoft employee, I was very impressed with the capabilities Power BI has to offer. Given a sufficiently large and diverse enough set of data it was a relatively fast process to transform the SMAT output files into a usable (and now re-usable) format which can then show insights into the data. I know for my customer and many others this means they can make business critical decisions about what, when, and where to migrate their SharePoint data to SharePoint Online. If you have any feedback on this series of blog posts or suggestions for additional report pages please let me know in the comments.