In this section we applied different algorithms for community detection. By partitioning our network into communities we want to uncover hidden structure in the network and discover close relationships between our characters.
Number of communities found is 9 and the corresponding modularity score is 0.36. In the context of this measure a partition is good when when there are many edges within communities and only a few between them [1] (nodes are densely connected within a community and sparsely connected with the rest of the graph [2]). According to [1], modularity above about 0.3 is a good indicator of significant community structure in a network in practice (network corresponds to a statistically surprising arrangement of edges).
One of natural questions that may come to one’s mind - can this partition be based on the branch that a character worked in? By investigating the above plot we can certainly say this is not the case, otherwise most of main characters should be assigned to one community representing Scranton branch. Let us then take a closer look at the most connected characters in each community.
Below presented are top 7 most connected characters (in terms of a total node degree in a full network) in each community, represented as a 2-column block (Name and Degree), which is repeated for every detected community. Direction of reading of the below table is therefore from left to right.
The aim is to see what are the most prominent members in each community, without distracting the reader with side- or guest characters that one might not recognize.
| Community 0 (n = 66) |
Community 1 (n = 32) |
Community 2 (n = 10) |
Community 3 (n = 25) |
Community 4 (n = 52) |
Community 5 (n = 39) |
Community 6 (n = 25) |
Community 7 (n = 6) |
Community 8 (n = 29) |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Name | Degree | Name | Degree | Name | Degree | Name | Degree | Name | Degree | Name | Degree | Name | Degree | Name | Degree | Name | Degree |
| Michael Scott | 139 | Jim Halpert | 104 | Kevin Malone | 43 | Ryan Howard | 42 | Andy Bernard | 90 | Dwight Schrute | 118 | Jan Levinson | 52 | Dawn Tinsley | 11.0 | Phyllis Vance | 55 |
| Toby Flenderson | 51 | Pam Beesly | 101 | Stacy | 8 | David Wallace | 40 | Erin Hannon | 42 | Angela Martin | 52 | Michael Scarn | 12 | Tim Canterbury | 9.0 | Stanley Hudson | 27 |
| Meredith Palmer | 35 | Cecelia Halpert | 24 | Brenda Matlowe | 6 | Karen Filippelli | 27 | Kelly Kapoor | 40 | Oscar Martinez | 49 | Astrid Levinson | 9 | Gareth Keenan | 8.0 | Clark Green | 21 |
| Holly Flax | 34 | Helene Beesly | 16 | Pizza Delivery Kid | 6 | Nellie Bertram | 17 | Darryl Philbin | 38 | Pete Miller | 24 | Gil | 8 | Lee | 6.0 | Todd Packer | 19 |
| Creed Bratton | 27 | Roy Anderson | 16 | Abby | 5 | Charles Miner | 16 | Robert California | 33 | Mose Schrute | 11 | Goldenface | 7 | Keith Bishop | 4.0 | Bob Vance | 18 |
| Hannah Smoterich-Barr | 11 | Phillip Halpert | 15 | Lynn | 2 | Josh Porter | 12 | Gabe Lewis | 31 | Robert Lipton | 11 | Hunter Raymond | 7 | Donna (UK) | 3.0 | Cynthia | 9 |
| Louanne Kelley | 11 | Betsy Halpert | 13 | Melissa Riley | 2 | Alan Brand | 6 | Deangelo Vickers | 20 | Cameraman | 9 | Carol Stills | 6 | Teri Hudson | 9 | ||
Some of the key insights derived from the above table and visualisation:
Remaining communities include the rest of main characters in different proportions which is also dependent on how many different threads along the show they share. For example, in community 4 we have Dwight and Angela who had an affair at the beginning of the show. We also have Oscar who is an accountant just like Angela but he also had an affair with Angela’s husband - Robert Lipton!
However, the modularity score is not as high as presented in [3] or [4], which suggests that the communities are not very well defined. Therefore, we tried other algorithms to see if they are more suitable for this type of network.
Two other algorithms that we tested:
greedy_modularity_communities - This function implements greedy modularity maximization algorithm described in [4] by Aaron Clauset, M. E. J. Newman and Cristopher Moore. It begins with each node in its own community and joins the pair of communities that most increases modularity until no such pair exists.async_fluidc - this function implements FLuid Communities algorithm desribed in [5] by Ferran Pares et al. It mimics the behaviour of several fluids (i.e., communities) expanding and pushing one another in a shared, closed and non-homogeneous environment (i.e., a graph), until equilibrium is found.Below presented is a summary of modularities found by the three algorithms.

As can be seen from the summary above, we did not get significantly better results. We have a slightly higher modularity for a partition returned by a greedy algorithm. However, the difference is at the level of 0.003 which we might by eliminated by rerunning Louvain algorithm as it is a randomized method.
One reason behind not so high modularity value might be the bias in the network originated from wiki pages. We need to keep in mind that there is a tendency of mentioning only the most prominent characters in the description of every other character. It is understandable that the texts do not include every single encounter or interaction as this is not the purpuse of this website. But we must be aware this fact “skews” the network significantly towards main roles in our tv series.
But is that all we can do to understand the structure of communities in The Office? Of course not :D In the following section we’ll make use of our dialogues dataset to see if there’s something more to be discovered.
Our dialogues dataset contains every line ever said in The Office. We also know who said it, and in what season, episode and scene. We used this information to create a weighted network. A link between two characters is placed if they shared at least one scene and weight is the number of scenes shared.
The resulting network is presented below:
After running the three algorithms on a weighted network, we obtained the following results:

Unfortunately, the results are even worse than for the basic network, created from Wiki pages.
In the end, we would like to mention that we also extracted backbone of this network, as described in [5]. The aim was to create a reduced but hopefully more meaningful version of our initial weighted network and focus our attention only on the most significant connections to see if the community structure in such a representation will be better defined. The whole process is explained in the explainer notebook.
However, this procedure did not improve the modularity score. We can therefore conclude that most of the plot of The Office indeed focuses on the main characters and the interactions between themselves or the side characters, but we do not observe any relations between the side or supporting characters. This impacts the modularity score, because we have a lot of “satellite” nodes - characters that played in single scenes and whose links lead only to hubs or other side characters who by chance played in the same scene (it can be well observed in the visualisation of the full weighted network above).
[1] A. Clauset, M. E. J. Newman, C. Moore, (2004), “Finding community structure in very large networks”, DOI: 10.1103/PhysRevE.70.066111, https://arxiv.org/abs/cond-mat/0408187.
[2] F. Pares, D. Garcia-Gasulla, A. Vilalta et al. (2017), “Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm”, https://arxiv.org/abs/1703.09307v3.
[3] U. Brandes, D. Delling, M. Gaertler et al. (February 2008). “On Modularity Clustering”. IEEE Transactions on Knowledge and Data Engineering. 20 (2): 172–188. https://dx.doi.org/10.1109/TKDE.2007.190689.
[4] Newman, M. E. J. (2006). “Modularity and community structure in networks”. Proceedings of the National Academy of Sciences of the United States of America. 103 (23): 8577–8696. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1482622/.
[5] M. A. Serranoa, M. Boguñáb, A. Vespignani (2009), “Extracting the multiscale backbone of complex weighted networks”, https://www.pnas.org/content/pnas/106/16/6483.full.pdf.