Knowledge Discovery from Online Social Network

Source of subsidy

CRSNG – Subvention à la découverte

Professors involved

  • Mohammad Bouguessa (LATECE)

Summary

The increasing amount of communication between individuals in e-formats (e.g. email, instant messaging, blogs, etc.) has motivated computational research in social network analysis. Social network analysis techniques aim to search communities of shared interests or leaders within communities. Social networks are often represented as graphs, where nodes represent individuals and edges represent the relationship between them. Such graphs are massive, in which node may contain a large amount of text data. Many existing social network analysis techniques focus either on the social network topology measured by communication frequencies or the content generated by the users. However, neither information alone is sufficient for finding accurately communities of shared interests and leaders within communities. The information in the text and the linkage structure re-enforce each other, and this leads to higher quality result. In addition to this, existing social network analysis techniques are only effective in analyzing graphs which are from a single source and relatively complete. Furthermore, most existing approaches assume that the structure of the network is static. However, online social networks change continually and links within the network come from different online sources. For example, consider links from Usenet to the blogosphere, links between tweets and news articles, etc. There are also some applications in which the whole network is not available at one time, but available in the form of continuous stream. Such applications create unique challenges, because the entire graph cannot be held in main memory. What is needed to make social network analysis more effective is to develop techniques that take into account textual content, uncertainty, incompleteness, heterogeneity of data sources and the need of developing specialized algorithms for Web applications that involve continuous stream of edges. Our goal is to address these issues by developing appropriate models and algorithms for mining effectively online social networks.