Comparing nodes
Comparing nodes is a process of checking how equivalent two nodes are to each other. This is used heavily in StaticShape as a way to see whether nodes should be merged or should become another node, like loops and conditionals.
For example, we have two html element nodes that need to be compared:
<h1>Home</h1>
<h1>About</h1>
For a human, it is easy to see that these are very similar. To make this comparison automatic, we need to have a mathematical representation of how “similar” they are. In StaticShape this is called an equivalency score.
#Equivalency scores
An equivalency score is a number between 0 and 1. A good way to interpret the numbers is as a percentage, between 0% match and 100% match. This gives us a way of comparing the nodes and having thresholds for different behaviour like merging and loops.
#Scoring elements equivalency
To compare the elements listed above, the HTML needs to be parsed into an element node:
{
"type": "element",
"name": "h1",
"attrs": {},
"children": [
{
"type": "text",
"value": "Home"
}
]
}
{
"type": "element",
"name": "h1",
"attrs": {},
"children": [
{
"type": "text",
"value": "About"
}
]
}
To compare the two elements we can now compare all of the metadata attached. This is where the code needs to be more specific about comparing the different types of nodes. In the above example, a comparison between two h1 occurs:
- ✅ Both nodes have
type
element
- ✅ Both nodes have
name
h1
- ✅ Both nodes have the same
attrs
- 🤔 Each node has a different list of childen
From this comparison, a score can be attached to each positive match. For properties like attrs
and children
, we can run the scoring recursively to get a score. The max score is also calculated to put the score between 0 and 1.
#Scoring nodes tree equivalency
When a tree needs a score, each element from the first tree is compared to the element in the same position in the second tree. Each score is summed and compared with the number of nodes.
[
{
"type": "text",
"value": "Home"
}
]
[
{
"type": "text",
"value": "About"
}
]
#Scoring text nodes
To finish our example, a method of scoring text nodes is needed.
{
"type": "text",
"value": "Home"
}
{
"type": "text",
"value": "About"
}
In the above example, a comparison between two text nodes occurs:
- ✅ Both nodes have
type
text
- 🤔 Each nodes
value
is not the same
As the value
key in a text node is a string, a method is needed for comparing the distance between two strings. For this StaticShape uses a algorithm called Levenshtein distance or edit distance. This algorithm gives a number of edits that would be required to change one string to another. This distance can be used as the score. The max score is the total number of characters in the longest string. Combining these, we can arrive at a score between 0 and 1.