Neo4j Experiments
Ever heard of Neo4j? Yea, so did I. Ever used Neo4j, or seen it used somewhere? No, neither did I. That is why I decided to spend a few hours researching Neo4j, and I must say, it is incredibly powerful if you have the correct usecase.
What is Neo4j?
Neo4j is a so called ‘Graph database’, which means that it stores nodes, and edges connecting them. Nodes can store attributes, and edges are directional. This makes Neo4j useful for storing and querying large graphs. All transactions are declared in a language called ‘CYPHER’. One of the examples of Neo4J includes a movie-database. This creates a graph consisting of movies, and actors that have a role in a movie. Creating the database looks as follows:
CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
CREATE
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),
(LillyW)-[:DIRECTED]->(TheMatrix),
(LanaW)-[:DIRECTED]->(TheMatrix),
(JoelS)-[:PRODUCED]->(TheMatrix)
In the lines above, a movie is created in the first line. Notice that the Movie
type was never declared prior to this. All types are dynamically created based on the associated attributes (in this case title
, released
and tagline
). The same is true for the Person
type. These will later be used to query the data again, so their names are important!
Data can be queried using the Match
statement:
MATCH (matrix:Movie {title: "The Matrix"}) RETURN matrix
╒════════════════════════════════════════════════════════════════╕
│matrix │
╞════════════════════════════════════════════════════════════════╡
│(:Movie {tagline: "Welcome to the Real World",title: "The Matrix│
│",released: 1999}) │
└────────────────────────────────────────────────────────────────┘
Or:
MATCH (people:Person) RETURN people.name LIMIT 10
╒════════════════════╕
│people.name │
╞════════════════════╡
│"Keanu Reeves" │
├────────────────────┤
│"Carrie-Anne Moss" │
├────────────────────┤
│"Laurence Fishburne"│
├────────────────────┤
│"Hugo Weaving" │
├────────────────────┤
│"Lilly Wachowski" │
├────────────────────┤
│"Lana Wachowski" │
├────────────────────┤
│"Joel Silver" │
├────────────────────┤
│"Emil Eifrem" │
├────────────────────┤
│"Charlize Theron" │
├────────────────────┤
│"Al Pacino" │
└────────────────────┘
But so-far, we’ve done nothing that would be hard in other databases such as MariaDB, Postgresql or MongoDB. The real power is shown when you want to query data that links to itself multiple times. In SQL, linking a table onto itself would require all kinds of horrible queries such as recursive CTE’s that are not only hard to write, also cost a ton of performance. In Neo4j this is as simple as:
MATCH p=shortestPath(
(bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"})
)
RETURN p
The actual movie tutorial goes into more depth, but as you can see, Neo4j is made for storing large graphs (=networks), and allows the cosumer to query parts of that graph.
Steam-Map
Okay, a tutorial is cool and all, but the proof is in the eating of the pudding. There is one thing that I’ve been wanting to make for a really long time, but I never got around to doing it. A visualisation of steam friendslists. Social media friendslists, like steam friendslist, are a giant network of many interconnected users. Doesn’t that sound like the perfect usecase for Neo4j? I thought so too. That is why I wanted to try use Neo4j to visualise steam friendslists.
Gathering data
Before anything can be done in Neo4j, some data from steam needs to be obtained. After browsing through the Steam WebAPI documentation I decided that it did not fit my usecase, I would need an API key, and fetching friendslists seemed difficult. That is why I turned to scraping the website, which is extremely easy as all friends contain a convenient .friend_block_v2
css class. Scraping a friendslist in C#, using AngleSharp is only a few lines of code:
public async Task<List<User>> GetUserFriends(string steamID)
{
IDocument document = await context.OpenAsync($"https://steamcommunity.com/profiles/{steamID}/friends");
var output = new List<User>();
var friends = document.QuerySelectorAll(".friend_block_v2");
foreach (var friend in friends)
{
var friendID = friend.Attributes["data-steamid"]?.TextContent ?? "0";
var name = friend.QuerySelector(".friend_block_content")?.TextContent.Split('\n').FirstOrDefault() ?? "unknown";
var avatar = friend.QuerySelector(".player_avatar > img")?.Attributes["src"]?.TextContent ?? "Unknown";
output.Add(new User(friendID, name, avatar));
}
return output;
}
That is all the code that is required to scrape friendslist from the steam webpage, lets hope that they don’t change their css classes. ;‘)
Interfacing with Neo4j
There may be all kinds of fancy libraries to interface with Neo4j from C#, but I wanted to learn the query language, and thus avoid any magic ‘just werks’ libraries. I only used the offical Neo4j.Driver package. At first, a connection needs to be made:
private readonly IDriver _driver = GraphDatabase.Driver(uri, AuthTokens.Basic(user, password));
and after that, executing cypher queries is as simple as:
public async Task CreateUser(User user)
{
await using var session = _driver.AsyncSession();
await session.ExecuteWriteAsync(
async tx =>
{
var result = await tx.RunAsync("""
MERGE (user:User {
steamID: $SteamID
})
SET user.name = $Username
SET user.avatar = $Avatar
""",
new
{
user.Username,
user.SteamID
}
);
}
);
}
All in all, it is relatively straight-forward to interface with Neo4j from C#. I also wrote some plumbing code which connects the steam scraper with the Neo4j interface, which I then made to import a ton of users from Steam. A simple query to fetch the top 500 Users
looks as follows:
This graph by itself is already highly interesting, as it shows how each user
has a bubble of friends around it, which are then connected to other bubbles through common links. It already shows me links between friends that I never even associated together in my mind, as they are in completely different friend-groups. Infact this is already, out of the box, as cool as I imagined my steam-map to be. But why stop here?
Creating a small React front-end
If I can scrape steam, store it in Neo4j and query it back easily, making a small front-end which allows the user to navigate it is basically required. So that is exactly what I did. I found a fource-graph library which had this insanely cool 3d demo, and I knew that it was exactly what I needed.
<ForceGraph3D
graphData={dataset as any}
controlType="orbit"
width={width}
height={height}
nodeThreeObject={(node: {
id: number;
text: string;
avatar: string;
}) => {
const group = new Group();
{
const texture = new TextureLoader().load(node.avatar);
texture.minFilter = LinearFilter;
texture.colorSpace = SRGBColorSpace;
var material = new SpriteMaterial({
map: texture,
});
var sprite = new Sprite(material);
sprite.scale.set(10, 10, 1);
group.add(sprite);
}
{
const sprite = new SpriteText(node.text);
sprite.translateY(-10);
sprite.textHeight = 4;
group.add(sprite);
}
return group;
}}
onNodeClick={(n) => {
setSteamID(n.id as any);
setSteamIDHistory([...steamIDHistory, steamID]);
}}
/>
A few hours of programming in React later I a simple front-end which allows me to navigate the Steam-Network in 3D. (A steam user with the vanity url ‘sample’ is used in the following videos)
The tool is not publically available online, this is because these types of networks grow exponentially, and the tool is easy to abuse. I am currently storing 234622 users, and that is only a couple of layers deep. The source code is available here