• Betweenness: a node is important if it lies on many shortest paths between other nodes
c
i
=
X
j=i=k
#(shortest paths between j and k that contain i)
#(shortest paths between j and k)
• Closeness: a node is important if it has shortest path lengths to others ()
c
i
=
1
P
i=j
min D(i, j)
•
†
Clustering coefficient: how a node’s neighboring nodes are connected with each other clustering
coefficient counts the #(triangles) in the ego-network
e
i
=
#(edges among neighboring nodes)
#(all possible edges among neighboring nodes)
∈ [0, 1]
•
†
Graphlet Degree Vector (GDV) counts #(graphlets/pre-specified subgraphs) that a node touches.
It provides a measure of a nodes’ local network topology
Link-level features
Link prediction as a task: (1) links missing at random; (2) links over time
Methodology: compute score c(i, j) for each pair of nodes and predict top n pairs as new links
• Distance-based feature (e.g., shortest-path distance)
• Local neighborhood overlap
• Common neighbors: |N (i) ∩ N(j)|
• Jaccard’s coefficient: |N (i) ∩ N(j)|/|N(i) ∪ N(j)|
• Adamic-Adar index (high if the common neighbor has small degree, useful in social net-
work):
P
u∈N(i)∩N(j)
1/ log(k
u
)
• Global neighborhood overlap: Let P
(K)
ij
denotes #paths of length K between i and j. We have
P
(K)
= A
K
, where A is the adjacency matrix. Then we can compute the Kats index
S
ij
=
∞
X
l=1
β
l
P
(l)
ij
=
∞
X
l=1
β
l
A
l
ij
where β ∈ (0, 1) is the discount factor. The closed-form of Katz index matrix is given by
S =
∞
X
l=1
β
l
A
l
= (I − βA)
−1
− I
Graph-level features
• Graphlet Kernel: given graph G and a graphlet list G
k
= {g
1
, g
2
, . . . , g
n
k
}, define the graphlet
count vector f
G
∈ R
n
k
as (f
G
)
i
= #(g
i
⊂ G) for i = 1, 2, . . . , n
k
. Define the graphlet kernel as
K(G, G
′
) = f
′
G
f
G
′
or K(G, G
′
) =
f
G
sum(f
G
)
′
f
G
′
sum(f
G
′
)
The second one helps deal with the situation in which G and G
′
have different sizes. But counting
graphlets is still very expensive.
• Weisfeiler-Lehman Kernel: color refinement algorithm. The intuition is to use neighborhood
structure to design a more efficient graph feature descriptor.
• Other kernels (e.g., random-work kernel, shortest-path graph kernel, etc.)
2