Power BI and R: Custom Visuals for Social Network Analysis


Social network analysis is quickly becoming an important tool to serve a variety of professional needs. It can inform corporate goals such as targeted marketing and identify security or reputational risks. Social network analysis can also help businesses meet internal goals: It provides insight into employee behaviors and the relationships among different parts of a company.

Organizations can employ a number of software solutions for social network analysis; each has its pros and cons, and is suited for different purposes. This article focuses on Microsoft’s Power BI, one of the most commonly used data visualization tools today. While Power BI offers many social network add-ons, we’ll explore custom visuals in R to create more compelling and flexible results.

This tutorial assumes an understanding of basic graph theory, particularly directed graphs. Also, later steps are best suited for Power BI Desktop, which is only available on Windows. Readers may use the Power BI browser on Mac OS or Linux, but the Power BI browser does not support certain features, such as importing an Excel workbook.

Structuring Data for Visualization

Creating social networks starts with the collection of connections (edge) data. Connections data contains two primary fields: the source node and the target node—the nodes at either end of the edge. Beyond these nodes, we can collect data to produce more comprehensive visual insights, typically represented as node or edge properties:

1) Node properties

  • Shape or color: Indicates the type of user, e.g., the user’s location/country
  • Size: Indicates the importance in the network, e.g., the user’s number of followers
  • Image: Operates as an individual identifier, e.g., a user’s avatar

2) Edge properties

  • Color, stroke, or arrowhead connection: Indicates type of connection, e.g., the sentiment of the post or tweet connecting the two users
  • Width: Indicates strength of connection, e.g., how many mentions or retweets are observed between two users in a given period

Let’s inspect an example social network visual to see how these properties function:

A graph of circles connected by lines of varying widths appears with three distinct sections. The left of the graph has six green shapes of various sizes labeled 1, 2, 3, 4, 5, and 6 in a hexagon. Numbers 1-5 are circles, while 6 is a diamond. They are interconnected by green arrows of varying widths and directions, and some arrowheads are filled green while others are not filled. To the right of the green shapes is the next section: three dark blue shapes arranged in a triangle that are labeled 7, 8, and 9, and are interconnected by blue arrows of varying widths and directions (with some arrowheads filled blue). Nodes 7 and 9 are connected to nodes 3 and 4 with gray arrows of varying widths and directions (with some arrowheads filled gray). In the middle of the graph, below the first two shape groups, is a single light blue diamond labeled 10. It is connected to nodes 5, 4, and 9 by dotted gray arrows of varying widths and directions (with some arrowheads filled gray).
Green, light blue, and dark blue nodes and varying circle or diamond shapes demonstrate different node types. Numbers with transparent backgrounds act as the node image identifiers, and larger nodes (such as Node 4) are more important in the network. Different edge types are indicated by color (green, blue, or gray), stroke (solid or dotted), and arrowheads (empty or filled); edge width shows strength (for example, the connection from Node 8 to Node 9 is strong).

We can also use hover text to supplement or replace the above parameters, as it can support other information that cannot be easily expressed through node or edge properties.

Comparing Power BI’s Social Network Extensions

Having defined the different data features of a social network, let’s examine the pros and cons of four popular tools used to visualize networks in Power BI.

Extension Social Network Graph by Arthur Graus Network Navigator Advanced Networks by ZoomCharts (Light Edition) Custom Visualizations Using R
Dynamic node size Yes Yes Yes Yes
Dynamic edge size No Yes No Yes
Node color customization Yes Yes No Yes
Complex social network processing No Yes Yes Yes
Profile images for nodes Yes No No Yes
Adjustable zoom No Yes Yes Yes
Top N connections filtering No No No Yes
Custom information on hover No No No Yes
Edge color customization No No No Yes
Other advanced features No No No Yes

Social Network Graph by Arthur Graus, Network Navigator, and Advanced Networks by ZoomCharts (Light Edition) are all suitable extensions to develop simple social networks and get started with your first social network analysis.

Many dark blue, light blue, and orange circles (50+ circles) are connected by thin gray lines on a white background. The circles have a solid color border and are filled with small images of various Pokémon that have a white background, and the circles block the view of most of the gray lines. They form a circular shape overall.
An example visualization made using the Social Network Graph by Arthur Graus extension.

Many blue, purple, and gray circles (50+ circles) are connected by thin gray lines on a white background. The circles are solid and filled, and block the view of some of the gray lines. They form a circular arrangement overall.
An example visualization made using the Network Navigator extension.

Many large teal and small orange circles (50+ circles) are connected by thin gray lines on a white background. The circles are solid and filled, and most of the gray lines are visible. They form a horizontal wedge shape overall, with more densely populated circles appearing on the right side. On the bottom left of the chart, there are a few widget icons and two labeled circles: a teal circle labeled
An example visualization made using the Advanced Networks by ZoomCharts (Light Edition) extension.

However, if you want to make your data come alive and uncover groundbreaking insights with attention-grabbing visuals, or if your social network is particularly complex, I recommend developing your custom visuals in R.

Many green, blue, and purple circles (50+ circles) are connected by thin lines of varying colors (green, gray, and red) on a white background. The circles are solid and filled with a Pokémon image at their center, and most of the thin lines are visible. They form a spread-out circular shape overall, with the green circles frequently branching out toward smaller blue or purple circles. The top right corner of the chart has the text
An example visualization made using custom visuals in R.

This custom visualization is the final result of our tutorial’s social network extension in R and demonstrates the large variety of features and node/edge properties offered by R.

Building a Social Network Extension for Power BI Using R

Creating an extension to visualize social networks in Power BI using R comprises five distinct steps. But before we can build our social network extension, we must load our data into Power BI.

Prerequisite: Collect and Prepare Data for Power BI

You can follow this tutorial with a test dataset based on Twitter and Facebook data or proceed with your own social network. Our data has been randomized; you may download real Twitter data if desired. After you collect the required data, add it into Power BI (for example, by importing an Excel workbook or adding data manually). Your result should look similar to the following table:

A table with thirteen alternating gray and white rows appears. It has a title---

Once you have your data set up, you are ready to create a custom visualization.

Step 1: Set Up the Visualization Template

Developing a Power BI visualization is not simple—even basic visuals require thousands of files. Fortunately, Microsoft offers a library called pbiviz, which provides the required infrastructure-supporting files with only a few lines of code. The pbiviz library will also repackage all of our final files into a .pbiviz file that we can load directly into Power BI as a visualization.

The simplest way to install pbiviz is with Node.js. Once pbiviz is installed, we need to initialize our custom R visual via our machine’s command-line interface:

pbiviz new toptalSocialNetworkByBharatGarg -t rhtml
cd toptalSocialNetworkByBharatGarg
npm install 
pbiviz package

Don’t forget to replace toptalSocialNetworkByBharatGarg with the desired name for your visualization. -t rhtml informs the pbiviz package that it should create a template to develop R-based HTML visualizations. You will see errors because we have not yet specified fields such as the author’s name and email in our package, but we will resolve these later in the tutorial. If the pbiviz script won’t run at all in PowerShell, you first may need to allow scripts with Set-ExecutionPolicy RemoteSigned.

On successful execution of the code, you will see a folder with the following structure:

A File Explorer listing containing eight subfolders (.tmp, .vscode, assets, dist, node_modules, r_files, src, and style) and eight files (capabilities.json, dependencies.json, package.json, package-lock.json, pbiviz.json, script.r, tsconfig.json, and tslint.json). All of the files are 1 KB, except for capabilities.json (2 KB) and package-lock.json (23 KB).

Once we have the folder structure ready, we can write the R code for our custom visualization.

Step 2: Code the Visualization in R

The directory created in the first step contains a file named script.r, which consists of default code. (The default code creates a simple Power BI extension, which uses the iris sample database available in R to plot a histogram of Petal.Length by Petal.Species.) We will update the code but retain its default structure, including its commented sections.

Our project uses three R libraries:

Let’s replace the code in the Library Declarations section of script.r to reflect our library usage:

libraryRequireInstall("DiagrammeR")
libraryRequireInstall("visNetwork")
libraryRequireInstall("data.table")

Next, we will replace the code in the Actual code section with our R code. Before creating our visualization, we must first read and process our data. We will take two inputs from Power BI:

  • num_records: The numeric input N, such that we will select only the top N connections from our network (to limit the number of connections displayed)
  • dataset: Our social network nodes and edges

To calculate the N connections that we will plot, we need to aggregate the num_records value because Power BI will provide a vector by default instead of a single numeric value. An aggregation function like max achieves this goal:

limit_connection <- max(num_records)

We will now read dataset as a data.table object with custom columns. We sort the dataset by value in decreasing order to place the most frequent connections at the top of the table. This ensures that we choose the most important records to plot when we limit our connections with num_records:

dataset <- data.table(from = dataset[[1]]
                      ,to = dataset[[2]]
                      ,value = dataset[[3]]
                      ,col_sentiment = dataset[[4]]
                      ,col_type = dataset[[5]]
                      ,from_name = dataset[[6]]
                      ,to_name = dataset[[7]]
                      ,from_avatar = dataset[[8]]
                      ,to_avatar = dataset[[9]])[
order(-value)][
seq(1, min(nrow(dataset), limit_connection))]

Next, we must prepare our user information by creating and allocating unique user IDs (uid) to each user, storing these in a new table. We also calculate the total number of users and store that information in a separate variable called num_nodes:

user_ids <- data.table(id = unique(c(dataset$from, 
                                     dataset$to)))[, uid := 1:.N]

num_nodes <- nrow(user_ids) 

Let’s update our user information with additional properties, including:

  • The number of followers (size of node).
  • The number of records.
  • The type of user (color codes).
  • Avatar links.

We will use R’s merge function to update the table:

user_ids <- merge(user_ids, dataset[, .(num_follower = uniqueN(to)), from], by.x = 'id', by.y = 'from', all.x = T)[is.na(num_follower), num_follower := 0][, size := num_follower][num_follower > 0, size := size + 50][, size := size + 10]

user_ids <- merge(user_ids, dataset[, .(sum_val = sum(value)), .(to, col_type)][order(-sum_val)][, id := 1:.N, to][id == 1, .(to, col_type)], by.x = 'id', by.y = 'to', all.x = T)

user_ids[id %in% dataset$from, col_type := '#42f548']

user_ids <- merge(user_ids, unique(rbind(dataset[, .('id' = from, 'Name' = from_name, 'avatar' = from_avatar)],
      dataset[, .('id' = to, 'Name' = to_name, 'avatar' = to_avatar)])),
      by = 'id')

We also add our created uid to the original dataset so that we can retrieve the from and to user IDs later in the code:

dataset <- merge(dataset, user_ids[, .(id, uid)],
                                by.x = "from", by.y = "id")

dataset <- merge(dataset, user_ids[, .(id, uid_retweet = uid)],
                                by.x = "to", by.y = "id")

user_ids <- user_ids[order(uid)]

Next, we create node and edge data frames for the visualization. We choose the style and shape of our nodes (filled circles), and select the correct columns of our user_ids table to populate our nodes’ color, data, value, and image attributes:

nodes <- create_node_df(n = num_nodes, 
                        type = "lower",
                        style = "filled",
                        color = user_ids$col_type, 
                        shape="circularImage",
                        data = user_ids$uid,
                        value = user_ids$size,
                        image = user_ids$avatar,
                        title = paste0("<p>Name: <b>", user_ids$Name,"</b><br>",
                                       "Super UID <b>", user_ids$id, "</b><br>",
                                       "# followers <b>", user_ids$num_follower, "</b><br>",
                                       "</p>")
                        )

Similarly, we pick the dataset table columns that correspond to our edges’ from, to, and color attributes:

edges <- create_edge_df(from = dataset$uid,
                        to = dataset$uid_retweet,
                        arrows = "to",
                        color = dataset$col_sentiment)

Finally, with the node and edge data frames ready, let’s create our visualization using the visNetwork library and store it in a variable the default code will use later, called p:

p <- visNetwork(nodes, edges) %>%
  visOptions(highlightNearest = list(enabled = TRUE, degree = 1, hover = T)) %>%
  visPhysics(stabilization = list(enabled = FALSE, iterations = 10), adaptiveTimestep = TRUE, barnesHut = list(avoidOverlap = 0.2, damping = 0.15, gravitationalConstant = -5000)) 

Here, we customize a few network visualization configurations in visOptions and visPhysics. Feel free to look through the documentation pages and update these options as desired. Our Actual code section is now complete, and we should update the Create and save widget section by removing the line p = ggplotly(g); since we coded our own visualization variable, p.

Step 3: Prepare the Visualization for Power BI

Now that we have finished coding in R, we must make certain changes in our supporting JSON files to prepare the visualization for use in Power BI.

Let’s start with the capabilities.json file. It includes most of the information you see in the Visualizations tab for a visual, such as our extension’s data sources and other settings. First, we need to update dataRoles and replace the existing value with new data roles for our dataset and num_records inputs:

# ...
  "dataRoles": [
    {
      "displayName": "dataset",
      "description": "Connection Details - From, To, # of Connections, Sentiment Color, To Node Type Color",
      "kind": "GroupingOrMeasure",
      "name": "dataset"
    },
    {
      "displayName": "num_records",
      "description": "number of records to keep",
      "kind": "Measure",
      "name": "num_records"
    }
  ],
# ...

In our capabilities.json file, let’s also update the dataViewMappings section. We’ll add conditions that our inputs must adhere to, as well as update the scriptResult to match our new data roles and their conditions. See the conditions section, along with the select section under scriptResult, for changes:

# ...
 "dataViewMappings": [
    {
       "conditions": [
        {
          "dataset": {
            "max": 20
          },
          "num_records": {
            "max": 1
          }
        }
      ],
      "scriptResult": {
        "dataInput": {
          "table": {
            "rows": {
              "select": [
                {
                  "for": {
                    "in": "dataset"
                  }
                },
                {
                  "for": {
                    "in": "num_records"
                  }
                }
              ],
              "dataReductionAlgorithm": {
                "top": {}
              }
            }
          }
        },
# ...

Let’s move on to our dependencies.json file. Here, we will add three additional packages under cranPackages so that Power BI can identify and install the required libraries:

{
    "name": "data.table",
      "displayName": "data.table",
      "url": "https://cran.r-project.org/web/packages/data.table/index.html"
},
{
    "name": "DiagrammeR",
      "displayName": "DiagrammeR",
      "url": "https://cran.r-project.org/web/packages/DiagrammeR/index.html"
},
{
    "name": "visNetwork",
      "displayName": "visNetwork",
      "url": "https://cran.r-project.org/web/packages/visNetwork/index.html"
},

Note: Power BI should automatically install these libraries, but if you encounter library errors, try running the following command:

install.packages(c("DiagrammeR", "htmlwidgets", "visNetwork", "data.table", "xml2"))

Lastly, let’s add relevant information for our visual to the pbiviz.json file. I’d recommend updating the following fields:

  • The visual’s description field
  • The visual’s support URL
  • The visual’s GitHub URL
  • The author’s name
  • The author’s email

Now, our files have been updated, and we must repackage the visualization from the command line:

pbiviz package

On successful execution of the code, a .pbiviz file should be created in the dist directory. The entire code covered in this tutorial can be viewed on GitHub.

Step 4: Import the Visualization Into Power BI

To import your new visualization in Power BI, open your Power BI report (either one for existing data or one created during our Prerequisite step with test data) and navigate to the Visualizations tab. Click the [more options] button and select Import a visual from a file. Note: You may need to first select Edit in a browser in order for the Visualizations tab to be visible.

A pane appears with the title

Navigate to the dist directory of your visualization folder and select the .pbiviz file to seamlessly load your visual into Power BI.

Step 5: Create the Visualization in Power BI

The visualization that you imported is now available in the visualizations pane. Click on the visualization icon to add it to your report, and then add relevant columns to the dataset and num_records inputs:

A pane appears with a selected tools icon that has the hover text

You can add additional text, filters, and features to your visualization depending on your project requirements. I also recommend that you go through the detailed documentation for the three R libraries we used to further enhance your visualizations, since our example project cannot cover all use cases of the available functions.

Upgrading Your Next Social Network Analysis

Our final result is a testament to the power and efficiency of R when it comes to creating custom Power BI visualizations. Try out social network analysis using custom visuals in R on your next dataset, and make smarter decisions with comprehensive data insights.

The Toptal Engineering Blog extends its gratitude to Leandro Roser for reviewing the code samples presented in this article.

From top to bottom, the words
As a Microsoft Gold Partner, Toptal is your elite network of Microsoft experts. Build high-performing teams with the experts you need—anywhere and exactly when you need them!



منبع

Matthew Newman

Matthew Newman Matthew has over 15 years of experience in database management and software development, with a strong focus on full-stack web applications. He specializes in Django and Vue.js with expertise deploying to both server and serverless environments on AWS. He also works with relational databases and large datasets
[ Back To Top ]