SNA or network as an analytical object

Reading the article will take you: 6 minutes.

Today, we will look into Social Network Analysis (SNA), or Social Relation Analysis.

SNA is the investigation of a community through the study of relationships between its members. It combines statistics, sociology, and social psychology. It is not, however, merely a theoretical concept, but more rather another source of knowledge that is gaining in popularity in various business and social realms, from exposing crime and terrorist rings to bank transfer analysis and resource management in organisations. We won’t be able to exhaust the whole complexity of the topic in one post but we can introduce this world and help understand the nature of SNA and where it can be useful.

To enter the world of SNA, you need to change the perspective and look at the data at hand with a fresh mind. As we mentioned before, network analysis consists in discovering relationships between units and assessing the impact of the relationships on the behaviour of the units. This means we focus not on units themselves but on their interrelations. The concept is founded on the assumption that it is the place in the structure (the network) that predetermines the behaviour of units. Note: do the units need to be persons? No. These can be phone numbers or bank accounts, in short, any identifiable object that can be the focus of interest of an analyst.

Does it make SNA a necessity in all circumstances? No. If we need statistics such as an average number of transactions in a month, or the mean amount of incoming transactions last week, we don’t need to change the perspective. But if we want to know the paths of money flow, the account with the largest transfers, and the direction of the transfers, then definitely yes.

The foundations are the key

SNA walks firmly on the path of data analysis because it stands on two legs: mathematics and sociology - graph theory and sociometry, to be precise. These two areas seem to be from completely different realms at first sight, but it turns out they compliment each other perfectly: One offers a numeric confirmation, and the other gives meaning to the numbers, or interprets them.

How did it all start? It’s time for the first leg: graph theory. Leonhard Euler is believed to be the father of graph theory. This 18^th-century mathematician tried to solve the problem known as the Seven Bridges of Königsberg. The problem is illustrated in the figure below.

Source: Seven Bridges of Königsberg, Wikipedia

Königsberg sits on both sides of the Pregel River and two islands on the river. Seven bridges were built on the river as shown in the picture. A riddle was thus constructed: is it possible to walk through all the bridges so that each is crossed only once? The problem can be illustrated with a graph where the points are river banks and islands while the lines represent the bridges.

"Source: : https://www.oer.uj.edu.pl/mod/book/view.php?id=22&chapterid=133

What is the graph then? The graph is a set of vertices and edges, which may (but do not need to) connect the vertices in various configurations. Some basic rules of graph building:

Not every vertex of a graph has to be connected by an edge with other ones;
One vertex may have one or multiple edges coming out of it (their number is the degree of the vertex);
More than one edge may connect any two vertices;
The edge may be a loop beginning and ending at the same vertex.

Back to the problem of the Seven Bridges of Königsberg. It is, in fact, the issue of finding a cycle in the graph. A cycle is a closed walk, a tour starting and ending at the same vertex. Leonhard Euler has proven that it is impossible in this case because such a specific cycle (Eulerian cycle where each vertex is used only once) exists if, and only if, the order of the graph is even. What is interesting, it is possible today to find the Eulerian cycle on the Pregel River because there are five bridges there (two of which from the 18^th century). The tour would not be very practical, though.

It is quite a strain to stand on one leg, so let's leave mathematics and move on to sociology, more specifically the notions of structure and sociometry we mentioned at the beginning of the article. Sociometry is believed to originate in the works of Jacob Moreno in the 20^th century. The researcher pointed out that since it can be assumed that the structures an individual functions in (family, class, work environment) has set rules which define the role and behaviour of individuals, it is the structure that needs to be investigated.

Sociometry is the method for studying the structure of power and communication between individuals. In the initial and probably most popular form, it involves testing a group with the question: “Who do you think would choose you?”, or indicating a person who should have a specific function. For this reason, sociometry is used mainly for smaller groups, such as peers at school. The analysis involves group cohesion (the level of interrelations), for example, which helps determine the degree of community integration.

Sociometry introduces notions you may come across every day and use in the popular meaning. This research method, however, gives them specific definitions:

‘sociometric star’ – the person most often chosen by other members of the group in a study;
‘power (behind the throne)’ – the person indicated by the sociometric star and indicating the star;
‘clique’ – a subgroup in the community; every member of the clique has relationships with its other members.

Let's focus on relationships. What emerges from the foundations?

Not that we have the foundations, let’s move on to SNA proper. The network investigated by SNA is a conceptualization of a graph with:

a network of vertices (nodes, points) representing social actors (objects);
a network of lines representing relationships between the objects.

However, a network is more than a graph because we use it to try to analyse and interpret the vertices and edges, to find sense in them. Have a look at a simple network below.

Basic concepts you’ll come across in network analysis are

Entity – a vertex representing an object (noun);
Link – the vertex reflecting the action (verb);
Properties – of the entity or link.

The network shown above consists of 3 entities and 5 edges (including multiple links between two entities). The nature of both entities and edges is different, which we know thanks to the properties. Note that various types of entities may be linked in one network such as persons and false identities. The network tells us that An Smith who lives in New York met her father in Washington, DC. The man probably contacted her by phone (as indicated by the dotted line and the direction of the arrow). We also know that Kate Kowalsky who is involved in a theft passes herself off as An Smith. Additionally, a look at the dates raies the question of whether the man could be involved in the theft. Is the family relationship of any relevance?

The situation above indicates the egocentric approach to network analysis where we follow an individual and check their network and interrelations. Such a local perspective is a way for conducting an analysis from a particular start point, which is often used in criminal analyses, for example.

The other, global, approach to SNA is the sociocentric analysis where we focus on the structure, its cohesion, finding subgroups, etc.[1] A network’s structure:

Affects its functioning and possibilities of reaching set key goals;
Affects the description of the network, which may not be obvious at first glance; smaller subnetworks may emerge, for example;
Affects the relationships between important influence actors who dominate the whole structure;
Affects how fast and directly information flows between entities and different parts of the network.[2]

Can relationships be measured?

According to SNA principles relationships can not only be measured, but should be measured. The question is what to measure and how to do it. The concept of measures in network analysis is a complex one and deserves a separate article. Hence, the information presented here is only basic.

The measurement of network structure centrality is an important component both in the local and global approach. It helps answer the question as to who is the most important person in the network (the one in power). There is a certain similarity to sociometric research here. Entity centrality may be perceived in three ways:

Activity in the network, which is the number of links that facilitate quick interception and dissemination of information;
Being effective by means of having short distances to other entities, or directness of relationships, which makes it possible to reach every part of the network;
Gatekeepers who control the flow of information between various parts of the network[3].

Depending on how you understand centrality, different entities or parts of the network can dominate in the results of analyses. What to choose? The best way is to test several measures and describe the network extensively in various dimensions.

The measure relevant to the first definition of centrality is the degree, which reflects the number of direct links of an entity. Its direction, to (in) or from (out) the entity, can be measured as well. The larger the number of links, the greater the activity. Simply, the more contacts, the easier it is to succeed. The degree is the measure used in the local approach to analysis.

The second way to understand centrality is measured with closeness and eigenvector. When calculating closeness, you measure the distance (in links) between the entity of interest and all other entities. The one positioned the closest to all neighbours wins. Eigenvector not only checks links but also weights the importance of the linked entities. It is not only the distance that matters but also whether it links with important parts of the network. The measures can be directional.

The gatekeeper is determined, also directionally, by means of betweenness. This measure counts how many paths (connections between several entities) run through the entity of interest. This means that information flow does not need to be controlled by the most active entity, or the one with the most effective relationships with other entities in the network, but the entity that facilitates the greatest number of paths in the network.

This text offers an outline of the functioning and domain of SNA. You can see not only its possibilities but also its sound analytical roots. Those of you who got entangled in the net of SNA can expect more from us on this topic.

-----

[1] Approaches to SNA and their consequences are discussed in depth in publications of dhr. dr. W. (Wouter) de Nooy, access: http://www.uva.nl/profiel/n/o/w.denooy/w.denooy.html

[2] If you are interested in SNA, have a look at a work by Padgett, J. F., and C. K. Ansell(1993): Robust action and the rise of the Medici, 1400-1434,The American Journal of Sociology; where SNA was conducted for historical data on family and business links between Florentine families in the 15^th century.

[3] Freeman L.C. (2004). The Development of Social Network Analysis.

Rate article:

Share the article on social media

Tags:

big datadata analysismultidimensional analysisPS CLEMENTINE PROstatistics

Previous article Next article