QUESTION: You conducted this research with the data integration division of Sansan Inc., a Japanese software company. How did the collaboration come about, and why did you decide to take part in it?
ANGELO MELE: The collaboration started from a casual email exchange between me and two data scientists at Sansan, Juan Nelson Martínez Dahbura and Takanori Nishida. They sent me an email saying that they read some of my work and wanted to explore the possibility of a collaboration as part of their company’s research platform Sansan Data Discovery, using data collected from their business card management app Eight. Later on, Shota Komatsu from Sansan also joined the research team.
In this project we use anonymized data of the users of their services, the data from Eight. Such data contain the users’ activity on the platform, which is designed to facilitate the formation of business relationships. For a network researcher, such network data is great and can be used to explore models, test estimation methods, and develop computational innovations.
Furthermore, the data have important applications in labor markets, as we have seen a lot of work-related relationships shifting to online platforms recently. The researchers at Sansan were responsible for the data analysis, since the data are sensitive; I developed the theoretical model and together we developed the estimation method and the software library to obtain the results.
Could you describe how Eight works, and how the data it provided was important to this study?
The exchange of business cards is an important ritual in the Japanese culture, and it fosters important new relationships. The platform works by allowing users to scan business cards that they received, as well as create new relationships directly in the platform. The data collected by the company allows us to follow people over time and all their professional relationships, and we know when the ties were formed. It also allows us to see when a user changes jobs, receives a promotion, or changes a title. We have some demographic characteristics of the users as well.
The data allows us to estimate a model of “networking on the job” – that is, how a variety of factors influence organizational relationships. Our theoretical model postulates that there are observable characteristics (such as location) and unobservable characteristics (being social) that affect the ability and willingness of the users to form professional relationships.
We also assume that the users are strategic about the relationships they invest in, and tend to form relationships that are beneficial to them. Using the data from Eight, we can estimate the parameters of such a model; in particular, we can distinguish how observable and unobservable characteristics affect the propensity of two users to form a new tie.
Does the data from Eight significantly differ from what researchers might be able to mine from better-known networking platforms such as LinkedIn and Facebook?
One of the main features of Eight is that we know the users have met in person. This is because the vast majority of the data consists of scanned business cards that were exchanged in person. This is something that is much more difficult to assess in LinkedIn or Facebook, as many users in these platforms have never met in person. This gives a measure of social interactions that is less contaminated by virtual/online interactions.
Can you expand on some of your key findings? First, what does it mean that business networks consist of “several areas of denser connectivity, or communities”?
Many social networks tend to organize in clusters. That means that the shape of the network shows groups of users who have lots of connections among themselves, and fewer connections with other groups of users. We call these groups “communities.” The main empirical question is what determines the size and shape of these communities. Indeed, there are several possible explanations.
On one hand, the organization of networks in communities may just be due to what network scientists call “homophily” – the preference to interact with similar people. That is, what we observe is just the fact that people who are similar along demographics or preferences tend to interact more than people who are different. On the other hand, these communities may be the result of “transitivity” – the willingness of two users to interact increases if they have some common contact or they know the same set of people. Finally, an additional explanation is that there are unobservable variables and characteristics (unobserved by the researchers) that drive the homophily and community structure.
Our model and empirical analysis are able to distinguish among these three explanations, providing the contribution of each of them to the final shape of the network.