⛺ Home

Prefix Trees and a simple implementation

Data Structures

The Prefix tree also callled a trie is an often overlooked data structure. In world of the web being predominantly JS based, and languages like Go providing only providing first class data structures such as slices and maps, the mental model we have for most problems are arrays and maps.

Stated differently, we have arrays and associative arrays. We joke that it is turtles all the way down, but 99% (I made this number up) of application level code is more aptly described as arrays all the way down.

I have been working in the industry for ~8 years now, and I struggle to think of instances where I have needed anything more. There may have been a time where I used a linked-list to build an LRU cache however the times I needed anything more are few and far between.

Prefix Trees and HTTP request matching

Recently, however, I have stumbled upon a use case that is better and more elegantly solved by using prefix trees: associating data to strings hierarchichally .

My first use case for a prefix tree (also known as a trie) came when I was implementing muxter my own experimental attempt at a high performance Go router or http ServeMux. My first implementation simply create a map of each registered path segment that would contain contain a map to any next segments and handler associated with the current segment. This router was slow, and the performance was proportional to the number of segments in the of any given request. yikes.

For muxter's second attempt, a prefix tree was perfect as we can simply traverse it using the request's path, and find the handler that was registered at the longest matching path exactly as defined in the spec of net/http's ServeMux . As it turns out, writing a router that can match wildcards, catchall, and regular expressions needs a custom implementation of a prefix tree where the nodes in the tree need to have the type of node encoded, and have specific lookup logic associated with it. Thus muxter's prefix tree implementation is out of scope for this blog post although you are welcome to view it .

Indeed, I may have never though of prefix tree's again if it were not for writing this blog site. Sometimes I suffer from javascript fatigue (but not always) and since everything is static for this blog, I designed the website as a file server. Just HTML and CSS. However I didn't want to rewrite the same shell HTML for every page, and my blog posts have their own boilerplate within the standard HTML layout. A solution must be found.

I decided to write my content in a content directory, run a go program to read in the content and copy it into my server's folder structure where it could be embedded into the final binary. For every level in my content folder, if there exists a _base.html file it gets used as the parent shell and other html files are replaced where an html comment says REPLACE ME .

So for any given file I want to be able to locate the base html that best matches the path to the file.

Solution

If you've made it this far you've been patient, so here's my solution for a prefix tree. Enjoy!


type PrefixTree[T any] struct {
	key      string
	children []PrefixTree[T]
	indices  []byte
	value    *T
}

func (node *PrefixTree[T]) Insert(key string, value T) {
	// By checking if the key is empty, we enable storing values on the empty root of the tree,
	// which is great for creating a default for any submatch, including lookup of empty strings.
	// Also, the key will be empty if we matched a child node exactly, which let's us reset the value
	// of that node.
	if key == "" {
		node.value = &value
		return
	}

	var child *PrefixTree[T]
	for i, char := range node.indices {
		if char == key[0] {
			child = &node.children[i]
			break
		}
	}

	// Since there's no child that matches the key, we can directly create the
	// child on this node using the entire key.
	if child == nil {
		node.append(PrefixTree[T]{key: key, value: &value})
		return
	}

	cp := commonPrefixLength(key, child.key)

	// the key contains the entire child node's key, so we can insert the value onto this child
	if cp == len(child.key) {
		child.Insert(key[cp:], value)
		return
	}

	// We create a node that represents the common prefix betwen the child and key
	commonNode := PrefixTree[T]{key: key[:cp]}

	// If the key is the prefix, then we need to give it a value, else we append the rest of the key, as a child.
	if len(key) == cp {
		commonNode.value = &value
	} else {
		commonNode.append(PrefixTree[T]{key: key[cp:], value: &value})
	}

	childValue := *child
	childValue.key = childValue.key[cp:]

	commonNode.append(childValue)

	*child = commonNode
}

func (node *PrefixTree[T]) append(child PrefixTree[T]) {
	node.children = append(node.children, child)
	node.indices = append(node.indices, child.key[0])
}

func (node PrefixTree[T]) GetLongestSubmatch(key string) (result T) {
Outer:
	for {
		var ok bool

		key, ok = strings.CutPrefix(key, node.key)
		if !ok {
			return
		}

		if node.value != nil {
			result = *node.value
		}

		if key == "" {
			return
		}

		for i, b := range node.indices {
			if b == key[0] {
				node = node.children[i]
				continue Outer
			}
		}

		return
	}
}

func commonPrefixLength(a, b string) (i int) {
	minimum := min(len(a), len(b))
	for i = 0; i < minimum; i++ {
		if a[i] != b[i] {
			break
		}
	}
	return
}