Network analysis can be used to generate insight for any relational data. In this tutorial we will demonstrate how insight can be produced for a certain domain of knowledge using InfraNodus text network visualization tool.
While the exact definition of “insight” may vary depending on the context, it is generally associated with “aha” moments, problem-solving, new discoveries, and the ability to get an overview over a situation.
In this particular example we will show how you can use InfraNodus to get insight about a particular topic of interest: “text network visualization”.
Click the top left menu button, then choose the + add new graph menu option, name your graph, and click save.
You will be forwarded to the new empty graph window
In order to generate insight you need to start with something. You can:
Write a short abstract (300-500 words) describing your situation or problem
Copy and paste your existing notes on a certain subject
Copy and paste an existing article or a Wikipedia post on the subject
Import Google search results on the subject
For the example below we will use InfraNodus to import the top 40 Google search results for the query text network analysis to gain some new insights on this topic.
Add the name of the graph and then click “Import”…
Your graph will be visualized as a network and you should open the Essence tab (top menu) to see the overview of the main results.
Here are the main features you should know about. You can learn more in our tutorial on How to interpret network graphs.
The bigger nodes have a higher betweenness centrality. Those are the words with the highest influence in the narrative: they connect the different topics present within the text and have a high frequency.
In our example: text, network, analysis, method
The nodes that are closer to each other on the graph and have the same color comprise topical clusters. These are the words that tend to co-occur more often in the same context in your data.
In our example:
1. network, analysis, text
2. analyze, study, paper
3. identify, graph, topic
4. tool, word, post
This visualization and topic modeling feature of InfraNodus already gives us a good idea of the kind of discourse that happens online about text network analysis.
We can see people talk about methods, there are studies and papers on the subjects (so we can assume that this discourse occurs in the scientiic context).
We can also see that most of it is happening around the subjects of identifying topic graphs and also the tools used for TNA.
So the current discourse around TNA focuses on the methods, studies, and tools for topic modeling and most of it is happening in the scientific realm.
If you look a the list of topics identified by InfraNodus, you can see the topic #4: tool, word, post.
We don’t really know what it means, but using InfraNodus we can get exactly to the part of the data that mentions all of these terms to see what context they appear in.
In order to do that:
You will then see the filtered version of the graph
Click the Data tab in the top menu and you will see exactly the part of the text that contains all or most of the terms you selected:
From the results above you can see that these clusters refer to the tools that allow one to build word clouds from user posts — a practical application of text network analysis.
Using the graph we can discover the gaps in the existing discourse, which will lead us to new ideas.
In order to do that, we click the Insight tab at the top menu. The InfraNodus algorithm analyzes the structure of the graph and detects:
It then proposes to bridge the structural gap between them by asking a question that would link these two clusters together.
You can try this on your data. To give an example, in our case:
We can see that there may be an interesting relation between
For instance, we could ask a question: “is there any way to implement topic modeling into word cloud tools”? — this may lead us to a new idea or a new research direction.
Another powerful feature of InfraNodus is that it allows you to remove the nodes from the graph in order to see what’s hiding behind them. The algorithm will automatically recalculate the topical clusters and the most influential words, showing you the new results and highlighting the new structural gaps.
For example, you can click on the most influential words (and also very obvious) words, such as text, network, analysis:
The new topical clusters are automatically calculated:
Now we can discover the more specific parts of the discourse, which relate to discourse analysis and visual methods for data science.
If you click on the Insight tab in the top menu, you will see the new topical clusters and the proposed relation between them:
In our example, it’s a link between a scientific paper on the visual method for text network analysis that we wrote (it appeared in the search results) and a post on using discourse structure analysis to estimate bias we made in Towards Data Science journal. Therefore proposing that it could be interesting to link the two approaches or, as they are already linked (both were written by the same person), it could be interesting to publish a scientific paper on the discourse structure analysis method as an update to the earlier approach.
I encourage you to try the methodology above on your own data and please, send me your feedback or questions — I would be curious to know how it works for you!
Hope you enjoy this tutorial!
Dmitry Nodus Labs