[Chapter1]

1. Analyzing the Wikipedia voters network [27 points]

Download the Wikipedia voting network wiki-Vote.txt.gz: http://snap.stanford.edu/data/wiki-Vote.html.

Using one of the network analysis tools above, load the Wikipedia voting network. Note that Wikipedia is a directed network. Formally, we consider the Wikipedia network as a directed graph G=(V,E)G = (V, E) , with node set V and edge set EV×VE ⊂ V × V where (edges are ordered pairs of nodes). An edge (a, b) ∈ E means that user a voted on user b.

To make our questions clearer, we will use the following small graph as a running example: Gsmall = (Vsmall, Esmall), where Vsmall = {1, 2, 3} and Esmall = {(1, 2), (2, 1), (1, 3), (1, 1)}.

"Compute and print out the following statistics for the wiki-Vote network"

[1] The number of nodes in the network. (Gsmall has 3 nodes.)

[2] The number of nodes with a self-edge (self-loop), i.e., the number of nodes a ∈ V where (a, a) ∈ E. (Gsmall has 1 self-edge.)

[3] The number of directed edges in the network, i.e., the number of ordered pairs (a, b) ∈ E for which a ̸= b. (Gsmall has 3 directed edges.)

[4] The number of undirected edges in the network, i.e., the number of unique unordered pairs (a,b), a != b, for which (a,b)or(b,a)E(orboth)(a,b) ∈ or (b,a) ∈ E (or both) . If both (a,b) and (b,a) are edges, this counts a single undirected edge. (Gsmall has 2 undirected edges.)

[5] The number of reciprocated edges in the network, i.e., the number of unique unordered pairs of nodes (a, b), a ̸= b, for which (a, b) ∈ E and (b, a) ∈ E. (Gsmall has 1 reciprocated edge.)

[6] The number of nodes of zero out-degree. (Gsmall has 1 node with zero out-degree.) SOURCE

[7] The number of nodes of zero in-degree. (Gsmall has 0 nodes with zero in-degree.) SINK

[8] The number of nodes with more than 10 outgoing edges (out-degree > 10) 1612

[9] The number of nodes with fewer than 10 incoming edges (in-degree < 10)

Further Analyzing the Wikipedia voters network

[1] Plot the distribution of out-degrees of nodes in the network on a log-log scale. Each data point is a pair (x, y) where x is a positive integer and y is the number of nodes in the network with out-degree equal to x. Restrict the range of x between the minimum and maximum out-degrees. You may filter out data points with a 0 entry. For the log-log scale, use base 10 for both x and y axes.

X 를 한 노드가 가지는 out-degree 라고 두었을때, out-degree가 최소일때는 0 이고, 최대일때는 893임을 알수 있었다.

[2] Compute and plot the least-square regression line for the out-degree distribution in the log-log scale plot. Note we want to find coefficients a and b such that the function log10 y = a · log10 x + b, equivalently, y = 10b · xa, best fits the out-degree distribution. What are the coefficients a and b? For this part, you might want to use the method called polyfit in NumPy with deg parameter equal to 1.

Finding Experts on the Java Programming Language on StackOverflow

Download the StackOverflow network stackoverflow-Java.txt.gz: http://snap.stanford. edu/class/cs224w-data/hw0/stackoverflow-Java.txt.gz. An edge (a, b) in the network means that person a endorsed an answer from person b on a Java-related question.

Using one of the network analysis tools above, load the StackOverflow network. Note that StackOverflow is a directed network.

Compute and print out the following statistics for the stackoverflow-Java network:

[1] The number of weakly connected components(wcc) in the network. This value can be calculated in Snap.py via function GetWccs.

약한연결은 자기 자신으로 못돌아오는 경우를 말한다 101042

[2] The number of edges and the number of nodes in the largest weakly connected component. The largest weakly connected component is calculated in Snap.py with function GetMxWcc.

약한 연결 중 가장 큰 컴포넌트의 간선의 수: 322,485

약한 연결 중 가장 큰 컴포넌트의 노드의 수: 13,188

[3] IDs of the top 3 most central nodes in the network by PagePank scores. PageRank scores are calculated in Snap.py with function GetPageRank. 78,86,58

Last updated