Social network analysis is quickly becoming an important tool to serve a variety of professional needs. It can inform corporate goals such as targeted marketing and identify security or reputational risks. Social network analysis can also help businesses meet internal goals: It provides insight into employee behaviors and the relationships among different parts of a company.
Organizations can employ a number of software solutions for social network analysis; each has its pros and cons, and is suited for different purposes. This article focuses on Microsoft’s Power BI, one of the most commonly used data visualization tools today. While Power BI offers many social network add-ons, we’ll explore custom visuals in R to create more compelling and flexible results.
This tutorial assumes an understanding of basic graph theory, particularly directed graphs. Also, later steps are best suited for Power BI Desktop, which is only available on Windows. Readers may use the Power BI browser on Mac OS or Linux, but the Power BI browser does not support certain features, such as importing an Excel workbook.
Structuring Data for Visualization
Creating social networks starts with the collection of connections (edge) data. Connections data contains two primary fields: the source node and the target node—the nodes at either end of the edge. Beyond these nodes, we can collect data to produce more comprehensive visual insights, typically represented as node or edge properties:
1) Node properties
- Shape or color: Indicates the type of user, e.g., the user’s location/country
- Size: Indicates the importance in the network, e.g., the user’s number of followers
- Image: Operates as an individual identifier, e.g., a user’s avatar
2) Edge properties
- Color, stroke, or arrowhead connection: Indicates type of connection, e.g., the sentiment of the post or tweet connecting the two users
- Width: Indicates strength of connection, e.g., how many mentions or retweets are observed between two users in a given period
Let’s inspect an example social network visual to see how these properties function:
We can also use hover text to supplement or replace the above parameters, as it can support other information that cannot be easily expressed through node or edge properties.
Comparing Power BI’s Social Network Extensions
Having defined the different data features of a social network, let’s examine the pros and cons of four popular tools used to visualize networks in Power BI.
Extension | Social Network Graph by Arthur Graus | Network Navigator | Advanced Networks by ZoomCharts (Light Edition) | Custom Visualizations Using R |
---|---|---|---|---|
Dynamic node size | Yes | Yes | Yes | Yes |
Dynamic edge size | No | Yes | No | Yes |
Node color customization | Yes | Yes | No | Yes |
Complex social network processing | No | Yes | Yes | Yes |
Profile images for nodes | Yes | No | No | Yes |
Adjustable zoom | No | Yes | Yes | Yes |
Top N connections filtering | No | No | No | Yes |
Custom information on hover | No | No | No | Yes |
Edge color customization | No | No | No | Yes |
Other advanced features | No | No | No | Yes |
Social Network Graph by Arthur Graus, Network Navigator, and Advanced Networks by ZoomCharts (Light Edition) are all suitable extensions to develop simple social networks and get started with your first social network analysis.
However, if you want to make your data come alive and uncover groundbreaking insights with attention-grabbing visuals, or if your social network is particularly complex, I recommend developing your custom visuals in R.
This custom visualization is the final result of our tutorial’s social network extension in R and demonstrates the large variety of features and node/edge properties offered by R.
Building a Social Network Extension for Power BI Using R
Creating an extension to visualize social networks in Power BI using R comprises five distinct steps. But before we can build our social network extension, we must load our data into Power BI.
Prerequisite: Collect and Prepare Data for Power BI
You can follow this tutorial with a test dataset based on Twitter and Facebook data or proceed with your own social network. Our data has been randomized; you may download real Twitter data if desired. After you collect the required data, add it into Power BI (for example, by importing an Excel workbook or adding data manually). Your result should look similar to the following table:
Once you have your data set up, you are ready to create a custom visualization.
Step 1: Set Up the Visualization Template
Developing a Power BI visualization is not simple—even basic visuals require thousands of files. Fortunately, Microsoft offers a library called pbiviz
, which provides the required infrastructure-supporting files with only a few lines of code. The pbiviz
library will also repackage all of our final files into a .pbiviz
file that we can load directly into Power BI as a visualization.
The simplest way to install pbiviz
is with Node.js. Once pbiviz
is installed, we need to initialize our custom R visual via our machine’s command-line interface:
pbiviz new toptalSocialNetworkByBharatGarg -t rhtml
cd toptalSocialNetworkByBharatGarg
npm install
pbiviz package
Don’t forget to replace toptalSocialNetworkByBharatGarg
with the desired name for your visualization. -t rhtml
informs the pbiviz
package that it should create a template to develop R-based HTML visualizations. You will see errors because we have not yet specified fields such as the author’s name and email in our package, but we will resolve these later in the tutorial. If the pbiviz
script won’t run at all in PowerShell, you first may need to allow scripts with Set-ExecutionPolicy RemoteSigned
.
On successful execution of the code, you will see a folder with the following structure:
Once we have the folder structure ready, we can write the R code for our custom visualization.
Step 2: Code the Visualization in R
The directory created in the first step contains a file named script.r
, which consists of default code. (The default code creates a simple Power BI extension, which uses the iris
sample database available in R to plot a histogram of Petal.Length
by Petal.Species
.) We will update the code but retain its default structure, including its commented sections.
Our project uses three R libraries:
Let’s replace the code in the Library Declarations
section of script.r
to reflect our library usage:
libraryRequireInstall("DiagrammeR")
libraryRequireInstall("visNetwork")
libraryRequireInstall("data.table")
Next, we will replace the code in the Actual code
section with our R code. Before creating our visualization, we must first read and process our data. We will take two inputs from Power BI:
-
num_records
: The numeric input N, such that we will select only the top N connections from our network (to limit the number of connections displayed) -
dataset
: Our social network nodes and edges
To calculate the N connections that we will plot, we need to aggregate the num_records
value because Power BI will provide a vector by default instead of a single numeric value. An aggregation function like max
achieves this goal:
limit_connection <- max(num_records)
We will now read dataset
as a data.table
object with custom columns. We sort the dataset by value in decreasing order to place the most frequent connections at the top of the table. This ensures that we choose the most important records to plot when we limit our connections with num_records
:
dataset <- data.table(from = dataset[[1]]
,to = dataset[[2]]
,value = dataset[[3]]
,col_sentiment = dataset[[4]]
,col_type = dataset[[5]]
,from_name = dataset[[6]]
,to_name = dataset[[7]]
,from_avatar = dataset[[8]]
,to_avatar = dataset[[9]])[
order(-value)][
seq(1, min(nrow(dataset), limit_connection))]
Next, we must prepare our user information by creating and allocating unique user IDs (uid
) to each user, storing these in a new table. We also calculate the total number of users and store that information in a separate variable called num_nodes
:
user_ids <- data.table(id = unique(c(dataset$from,
dataset$to)))[, uid := 1:.N]
num_nodes <- nrow(user_ids)
Let’s update our user information with additional properties, including:
- The number of followers (size of node).
- The number of records.
- The type of user (color codes).
- Avatar links.
We will use R’s merge
function to update the table:
user_ids <- merge(user_ids, dataset[, .(num_follower = uniqueN(to)), from], by.x = 'id', by.y = 'from', all.x = T)[is.na(num_follower), num_follower := 0][, size := num_follower][num_follower > 0, size := size + 50][, size := size + 10]
user_ids <- merge(user_ids, dataset[, .(sum_val = sum(value)), .(to, col_type)][order(-sum_val)][, id := 1:.N, to][id == 1, .(to, col_type)], by.x = 'id', by.y = 'to', all.x = T)
user_ids[id %in% dataset$from, col_type := '#42f548']
user_ids <- merge(user_ids, unique(rbind(dataset[, .('id' = from, 'Name' = from_name, 'avatar' = from_avatar)],
dataset[, .('id' = to, 'Name' = to_name, 'avatar' = to_avatar)])),
by = 'id')
We also add our created uid
to the original dataset so that we can retrieve the from
and to
user IDs later in the code:
dataset <- merge(dataset, user_ids[, .(id, uid)],
by.x = "from", by.y = "id")
dataset <- merge(dataset, user_ids[, .(id, uid_retweet = uid)],
by.x = "to", by.y = "id")
user_ids <- user_ids[order(uid)]
Next, we create node and edge data frames for the visualization. We choose the style
and shape
of our nodes (filled circles), and select the correct columns of our user_ids
table to populate our nodes’ color
, data
, value
, and image
attributes:
nodes <- create_node_df(n = num_nodes,
type = "lower",
style = "filled",
color = user_ids$col_type,
shape="circularImage",
data = user_ids$uid,
value = user_ids$size,
image = user_ids$avatar,
title = paste0("<p>Name: <b>", user_ids$Name,"</b><br>",
"Super UID <b>", user_ids$id, "</b><br>",
"# followers <b>", user_ids$num_follower, "</b><br>",
"</p>")
)
Similarly, we pick the dataset
table columns that correspond to our edges’ from
, to
, and color
attributes:
edges <- create_edge_df(from = dataset$uid,
to = dataset$uid_retweet,
arrows = "to",
color = dataset$col_sentiment)
Finally, with the node and edge data frames ready, let’s create our visualization using the visNetwork
library and store it in a variable the default code will use later, called p
:
p <- visNetwork(nodes, edges) %>%
visOptions(highlightNearest = list(enabled = TRUE, degree = 1, hover = T)) %>%
visPhysics(stabilization = list(enabled = FALSE, iterations = 10), adaptiveTimestep = TRUE, barnesHut = list(avoidOverlap = 0.2, damping = 0.15, gravitationalConstant = -5000))
Here, we customize a few network visualization configurations in visOptions and visPhysics. Feel free to look through the documentation pages and update these options as desired. Our Actual code
section is now complete, and we should update the Create and save widget
section by removing the line p = ggplotly(g);
since we coded our own visualization variable, p
.
Step 3: Prepare the Visualization for Power BI
Now that we have finished coding in R, we must make certain changes in our supporting JSON files to prepare the visualization for use in Power BI.
Let’s start with the capabilities.json
file. It includes most of the information you see in the Visualizations tab for a visual, such as our extension’s data sources and other settings. First, we need to update dataRoles
and replace the existing value with new data roles for our dataset
and num_records
inputs:
# ...
"dataRoles": [
{
"displayName": "dataset",
"description": "Connection Details - From, To, # of Connections, Sentiment Color, To Node Type Color",
"kind": "GroupingOrMeasure",
"name": "dataset"
},
{
"displayName": "num_records",
"description": "number of records to keep",
"kind": "Measure",
"name": "num_records"
}
],
# ...
In our capabilities.json
file, let’s also update the dataViewMappings
section. We’ll add conditions
that our inputs must adhere to, as well as update the scriptResult
to match our new data roles and their conditions. See the conditions
section, along with the select
section under scriptResult
, for changes:
# ...
"dataViewMappings": [
{
"conditions": [
{
"dataset": {
"max": 20
},
"num_records": {
"max": 1
}
}
],
"scriptResult": {
"dataInput": {
"table": {
"rows": {
"select": [
{
"for": {
"in": "dataset"
}
},
{
"for": {
"in": "num_records"
}
}
],
"dataReductionAlgorithm": {
"top": {}
}
}
}
},
# ...
Let’s move on to our dependencies.json
file. Here, we will add three additional packages under cranPackages
so that Power BI can identify and install the required libraries:
{
"name": "data.table",
"displayName": "data.table",
"url": "https://cran.r-project.org/web/packages/data.table/index.html"
},
{
"name": "DiagrammeR",
"displayName": "DiagrammeR",
"url": "https://cran.r-project.org/web/packages/DiagrammeR/index.html"
},
{
"name": "visNetwork",
"displayName": "visNetwork",
"url": "https://cran.r-project.org/web/packages/visNetwork/index.html"
},
Note: Power BI should automatically install these libraries, but if you encounter library errors, try running the following command:
install.packages(c("DiagrammeR", "htmlwidgets", "visNetwork", "data.table", "xml2"))
Lastly, let’s add relevant information for our visual to the pbiviz.json
file. I’d recommend updating the following fields:
- The visual’s description field
- The visual’s support URL
- The visual’s GitHub URL
- The author’s name
- The author’s email
Now, our files have been updated, and we must repackage the visualization from the command line:
pbiviz package
On successful execution of the code, a .pbiviz
file should be created in the dist
directory. The entire code covered in this tutorial can be viewed on GitHub.
Step 4: Import the Visualization Into Power BI
To import your new visualization in Power BI, open your Power BI report (either one for existing data or one created during our Prerequisite step with test data) and navigate to the Visualizations tab. Click the … [more options] button and select Import a visual from a file. Note: You may need to first select Edit in a browser in order for the Visualizations tab to be visible.
Navigate to the dist
directory of your visualization folder and select the .pbiviz
file to seamlessly load your visual into Power BI.
Step 5: Create the Visualization in Power BI
The visualization that you imported is now available in the visualizations pane. Click on the visualization icon to add it to your report, and then add relevant columns to the dataset
and num_records
inputs:
You can add additional text, filters, and features to your visualization depending on your project requirements. I also recommend that you go through the detailed documentation for the three R libraries we used to further enhance your visualizations, since our example project cannot cover all use cases of the available functions.
Upgrading Your Next Social Network Analysis
Our final result is a testament to the power and efficiency of R when it comes to creating custom Power BI visualizations. Try out social network analysis using custom visuals in R on your next dataset, and make smarter decisions with comprehensive data insights.
The Toptal Engineering Blog extends its gratitude to Leandro Roser for reviewing the code samples presented in this article.