igraph——图挖掘助力社会网络分析,图数据库 graph

摘要：

从网站中挖掘有用的信息可以帮助一些群体提高竞争力。最近，我意外发现了一个名为“graph”的工具，它提供了一些非常有效的挖掘功能。˃library˃#Createadirectedgraph˃gg顶点：4边：4定向：TRUEEdges:[0]0-˃1[1]0-˃2[2]1-˃3[3]0-˃3˃#Createadadirectedgraphusingadjacencymatrix˃mm[，1][，2][，3][，4][1，]0.40863890.21609240.15579890.2896239[2，]0.46694560.107101.2906730.3715809[3，]0.20316780.3910.59062730.7411 7764[4，]0.88081190.76874930.97343230.4487252˃ggVertices:4Edges:5Directed:TRUEEdges:[0]2-˃2[1]2-˃3[2]3-˃0[3]3-˃1[4]3-˃2˃plot˃iGraph还提供了各种简单的方法来创建各种图形的图表˃#Createafullgraph˃g1g1Vertices:4Edges:6Directed:FALSEEdges:[0]0--1[1]0--2[2]0--3[3]1-[2]1-3[5]2-3˃#Createaringgraph˃g 2g2顶点：3边：3定向：FALSEEdges:[0]0--1[1]1--2[2]0--2˃#Combine2graphs˃gg顶点：7边：9定向：FALSEEdges:[0]0--1[1]0-[2]0--3[3]1-2-3[4]2-3[6]4-5[7]5-6[8]4-6˃图。差分顶点：7边：7定向：FALSEEdges:[0]0-3[1]1-3[2]1-2[3]2-3[4]4-6[5]4-5[6]5-6˃#Createattice˃g1=graph.lattice˃#Createatree˃g2=graph。Tree˃plot˃plotiGraph还提供了另外两种图形生成机制#Generaterandomgraph，fixedprobability˃gplot#Generaterandomgraph、fixednumberofarcs˃gg˂-barabasi。游戏简单图形算法本节介绍如何使用iGraph实现一些简单图形算法。最小生成树算法可以连接图中的所有节点，并最小化所有连接的权重。

http://www.ituring.com.cn/article/1762

社交网络（如Facebook，Twitter）可以完整地表现人们的生活。人们用不同的方式与他人互动，并且这些信息都可以在社交网络中抓取到。挖掘某个站点的有用信息可以帮助一些团体增加竞争力。

我最近无意中发现一款叫做“igraph”的工具，它提供了一些非常有效的挖掘功能。以下列举几条我觉得有意思的：

创建图表

图表由节点和连线组成，两者都可以附上一系列属性值（键/值对）。此外，连线可以是有向的也可以是无向的，还可以给它加上权重。

>library(igraph)># Create a directed graph>g <-graph(c(0,1,0,2,1,3,0,3),directed=T)>g
Vertices:4Edges:4Directed:TRUE
Edges:
[0]0->1[1]0->2[2]1->3[3]0->3># Create a directed graph using adjacency matrix>m <-matrix(runif(4*4),nrow=4)>m
[,1][,2][,3][,4][1,]0.40863890.21609240.15579890.2896239[2,]0.46694560.10710710.12906730.3715809[3,]0.20316780.39116910.59062730.7417764[4,]0.88081190.76874930.97343230.4487252>g <-graph.adjacency(m >0.5)>g
Vertices:4Edges:5Directed:TRUE
Edges:
[0]2->2[1]2->3[2]3->0[3]3->1[4]3->2>plot(g,layout=layout.fruchterman.reingold)>

enter image description here

iGraph也提供了多种创建各种图形的图表的简单方法

>#Create a full graph>g1 <-graph.full(4)>g1
Vertices:4Edges:6Directed:FALSE
Edges:
[0]0--1[1]0--2[2]0--3[3]1--2[4]1--3[5]2--3>#Create a ring graph>g2 <-graph.ring(3)>g2
Vertices:3Edges:3Directed:FALSE
Edges:
[0]0--1[1]1--2[2]0--2>#Combine 2 graphs>g <-g1 %du%g2
>g
Vertices:7Edges:9Directed:FALSE
Edges:
[0]0--1[1]0--2[2]0--3[3]1--2[4]1--3[5]2--3[6]4--5[7]5--6[8]4--6>graph.difference(g,graph(c(0,1,0,2),directed=F))Vertices:7Edges:7Directed:FALSE
Edges:
[0]0--3[1]1--3[2]1--2[3]2--3[4]4--6[5]4--5[6]5--6># Create a lattice>g1 =graph.lattice(c(3,4,2))># Create a tree>g2 =graph.tree(12,children=2)>plot(g1,layout=layout.fruchterman.reingold)>plot(g2,layout=layout.reingold.tilford)

enter image description here

iGraph还提供了另外两种图表生成的机制。“随机图表”可以在任意两个节点之间进行连线。而“优先连接”会给已经拥有较大度数的节点再增加连线（也就是多者更多）。

# Generate random graph, fixed probability>g <-erdos.renyi.game(20,0.3)>plot(g,layout=layout.fruchterman.reingold,vertex.label=NA,vertex.size=5)
# Generate random graph, fixed number of arcs>g <-erdos.renyi.game(20,15,type='gnm')
# Generate preferential attachment graph>g <-barabasi.game(60,power=1,zero.appeal=1.3)

enter image description here

简单图表算法

这一节会介绍如何使用iGraph来实现一些简单的图表算法

最小生成树算法可以在图表里连接所有的节点，并使所有的连线权重最小。

# Create the graph and assign random edge weights>g <-erdos.renyi.game(12,0.35)>E(g)$weight <-round(runif(length(E(g))),2)*50>plot(g,layout=layout.fruchterman.reingold,edge.label=E(g)$weight)# Compute the minimum spanning tree>mst <-minimum.spanning.tree(g)>plot(mst,layout=layout.reingold.tilford,edge.label=E(mst)$weight)

enter image description here

连通分支算法可以找到会连通其他节点的连接，也就是说，两个节点之间的路径会穿过其他节点。需要注意的是，在无向图里连通是要对称的，在有向图（节点A指向节点B，但节点B不指向节点A的图表）里不是必须的。因此在有向图中存在一种连接的概念叫做“强”，也就是只有两个节点都分别指向对方才意味着它们是连通的。“弱”的连接意味着它们不是连通的。

>g <-graph(c(0,1,1,2,2,0,1,3,3,4,4,5,5,3,4,6,6,7,7,8,8,6,9,10,10,11,11,9))# Nodes reachable from node4>subcomponent(g,4,mode="out")[1]456378# Nodes who can reach node4>subcomponent(g,4,mode="in")[1]431502
>clusters(g,mode="weak")$membership
 [1]000000000111$csize
[1]93$no
[1]2
>myc <-clusters(g,mode="strong")>myc
$membership
 [1]111222333000$csize
[1]3333$no
[1]4
>mycolor <-c('green','yellow','red','skyblue')>V(g)$color <-mycolor[myc$membership +1]>plot(g,layout=layout.fruchterman.reingold)

最短路径算法是最普遍的算法，它能找到节点A和节点B之间最短的路径。在iGraph里，如果图表是未加权的（也就是权重为1的）而且在权重为正时使用了迪杰斯特拉算法，会使用“breath-first search”算法。要是连线的权重是负数，则会使用Bellman-ford算法。

>g <-erdos.renyi.game(12,0.25)>plot(g,layout=layout.fruchterman.reingold)>pa <-get.shortest.paths(g,5,9)[[1]]>pa
[1]5049>V(g)[pa]$color <-'green'>E(g)$color <-'grey'>E(g,path=pa)$color <-'red'>E(g,path=pa)$width <-3>plot(g,layout=layout.fruchterman.reingold)

enter image description here

图表统计

通过大量统计信息我们可以大致看到图表的形状。在最高权限下，我们可以看到图表的各类信息，它包括:
- 图表的大小（节点和连线的数量）
- 图表的密度是紧密的（|E|与|V|的平方成正比）还是稀疏的（|E|与|V|成正比）?
- 图表是连通的（大部分节点是互通的）还是非连通的（节点是孤立的）？
- 图表中最长的两点之间距离
- 有向图的对称性
- 出/入“度”的分布

># Create a random graph >g <-erdos.renyi.game(200,0.01)>plot(g,layout=layout.fruchterman.reingold,vertex.label=NA,vertex.size=3)># No of nodes>length(V(g))[1]200># No of edges>length(E(g))[1]197># Density (No of edges / possible edges)>graph.density(g)[1]0.009899497># Number of islands>clusters(g)$no
[1]34># Global cluster coefficient:>#(close triplets/all triplets)>transitivity(g,type="global")[1]0.015># Edge connectivity, 0 since graph is disconnected>edge.connectivity(g)[1]0># Same as graph adhesion>graph.adhesion(g)[1]0># Diameter of the graph>diameter(g)[1]18># Reciprocity of the graph>reciprocity(g)[1]1># Diameter of the graph>diameter(g)[1]18># Reciprocity of the graph>reciprocity(g)[1]1>degree.distribution(g)[1]0.1350.2800.3150.1100.0950.0500.0050.010>plot(degree.distribution(g),xlab="node degree")>lines(degree.distribution(g))

enter image description here

往下一点，我们也可以看到每对节点的统计信息，比如:
- 计算两点之间没有公用连线的路径（也就是需要移除多少条连线可以使两节点不连通）
- 计算两点之间的最短路径
- 计算两点之间路径的数量和长度

># Create a random graph>g <-erdos.renyi.game(9,0.5)>plot(g,layout=layout.fruchterman.reingold)># Compute the shortest path matrix>shortest.paths(g)[,1][,2][,3][,4][,5][,6][,7][,8][,9][1,]013122132[2,]102232221[3,]320212221[4,]122031221[5,]231303132[6,]222130211[7,]122212021[8,]322231201[9,]211121110># Compute the connectivity matrix>M <-matrix(rep(0,81),nrow=9)>M
      [,1][,2][,3][,4][,5][,6][,7][,8][,9][1,]000000000[2,]000000000[3,]000000000[4,]000000000[5,]000000000[6,]000000000[7,]000000000[8,]000000000[9,]000000000>for(i in0:8){+for(j in0:8){+if(i ==j){+M[i+1,j+1]<--1+}else{+M[i+1,j+1]<-edge.connectivity(g,i,j)+}+}+}>M
      [,1][,2][,3][,4][,5][,6][,7][,8][,9][1,]-122323323[2,]2-12222222[3,]22-1222222[4,]322-123323[5,]2222-12222[6,]32232-1323[7,]322323-123[8,]2222222-12[9,]32232332-1>

enter image description here

中心性计算

在细节方面，我们可以看到各个节点的统计信息。根据这些数字可以测出节点的“中心性”
- 拥有较高出/入度数的节点也拥有较高的“度中心性”
- 与其他节点之间有短路径的节点拥有较高的“密集中心性”
- 与其他节点对之间有最短路径的节点拥有较高的“中间性”
- 连接了许多中心性较高节点的节点拥有较高的“特征向量中心性”
- 本地簇系数意味着相邻节点的互联性

># Degree>degree(g)[1]222223326># Closeness (inverse of average dist)>closeness(g)[1]0.44444440.53333330.53333330.5000000[5]0.44444440.53333330.61538460.5000000[9]0.8000000># Betweenness>betweenness(g)[1]0.83333332.33333332.3333333[4]0.00000000.83333330.5000000[7]6.33333330.000000018.8333333># Local cluster coefficient>transitivity(g,type="local")[1]0.00000000.00000000.00000001.0000000[5]0.00000000.66666670.00000001.0000000[9]0.1333333># Eigenvector centrality>evcent(g)$vector
[1]0.30198570.41971530.41971530.5381294[5]0.30198570.66931420.51706510.5381294[9]1.0000000># Now rank them>order(degree(g))[1]123458679>order(closeness(g))[1]154823679>order(betweenness(g))[1]486152379>order(evcent(g)$vector)[1]152374869

从中Drew Conway发现拥有低“特征向量中心性”和高“中间性”的人是很重要联系人，而拥有高“特征向量中心性”和低“中间性”的人与重要的人有关联。现在我们来绘制“特征向量中心性”和“中间性”的图表。

># Create a graph>g1 <-barabasi.game(100,directed=F)>g2 <-barabasi.game(100,directed=F)>g <-g1 %u%g2
>lay <-layout.fruchterman.reingold(g)># Plot the eigevector and betweenness centrality>plot(evcent(g)$vector,betweenness(g))>text(evcent(g)$vector,betweenness(g),0:100,cex=0.6,pos=4)>V(g)[12]$color <-'red'>V(g)[8]$color <-'green'>plot(g,layout=lay,vertex.size=8,vertex.label.cex=0.6)

enter image description here

在之后的帖子里我还会介绍一些特殊的社交网络分析的例子。

原文链接：Basic graph analytics using igraph（需要FQ）

igraph——图挖掘助力社会网络分析

相关文章

云计算与边缘计算协同九大应用场景

移动边缘计算中的安全问题现状

3.kubernetes的CNI网络插件-Flannel

(转)利用openfiler实现iSCSI

无线Mesh网络技术基础与应用

Hyperledger Fabric1.4的多机部署

最新文章

随机推荐

思享工具箱导航

JSON工具

格式化转换

加解密编码

文本数字

网络

站长

计算

其他

对照列表

igraph——图挖掘助力社会网络分析

相关文章

云计算与边缘计算协同 九大应用场景

移动边缘计算中的安全问题现状

3.kubernetes的CNI网络插件-Flannel

(转)利用openfiler实现iSCSI

无线Mesh网络技术基础与应用

Hyperledger Fabric1.4的多机部署

最新文章

随机推荐

思享工具箱导航

JSON工具

格式化转换

加解密编码

文本数字

网络

站长

计算

其他

对照列表

云计算与边缘计算协同九大应用场景