Just building a trie for LZ string searching is pretty easy. Using the linked-list method (which certainly has disadvantages), internal nodes only need a child & sibling pointer, and some bit of data. If you always descend one char at a time that data is just one char. If you want to do "path compression" (multi-char steps in a single link) you need some kind of pointer + length.
(it's actually much easier to write the code with path compression, since when you add a new string you only have to find the deepest match in the tree then add one node; with single char steps you may have to add many nodes).
So for a file of length N, internal nodes are something like 10 bytes, and you need at most N nodes. Leaves can be smaller or even implicit.
With just a normal trie, you have a nice advantage for optimal parsing, which is that when you find the longest match, you also automatically walk past all shorter matches. At each node you could store the most recent position that that substring was seen, so you can find the lowest offset for each length of match for free. (this requires more storage in the nodes plus a lot more memory writes, but I think those memory writes are basically free since they are to nodes in cache anyway).
The Find and Insert operations are nearly identical so they of course should be done together.
A trie could be given a "lazy update". What you do is on Insert you just tack the nodes on somewhere low down in the tree. Then on Find, when you encounter nodes that have not been fully inserted you pick them up an carry them with you as you descend. Whenever you take a path that your baggage can't take, you leave that baggage behind. This could have advantages under certain usage patterns, but I haven't actually tried it.
But it's only when you get the "follow" pointers that a suffix trie really makes a huge difference.
A follow pointer is a pointer in the tree from any node (substring) to the location in the tree of the substring without the first character. That is, if you are at "banana" in the tree, the follow pointer should point at the location of "anana" in the tree.
When you're doing LZ compression and you find a match at pos P of length L, you know that at pos P+1 there must be a match of at least length L-1 , simply by using the same offset and matching one less character. (there could be a longer match, though). So, if you know the suffix node that was used to find the match of length L at pos P, then you can jump in directly to match of length L-1 at the next position.
This is huge. Consider for example the fully degenerate case, a file of length N of all the same character. (yes obviously there are special case solutions to the fully degenerate case, but that doesn't fix the problem, it just makes it more complex to create the problem). A naive string matcher is actually O(N^3) !!
For each position in the file (*N)
Consider all potential matches (*N)
Compare all the characters in that potential match (*N)
A normal trie makes this O(N^2) , because the comparing characters in the string is combined with finding all potential matches, so the tree descent + string compares
combined are just O(N).
But a true suffix trie with follow pointers is only O(N) for the whole parse. Somewhere early on would find a match of length O(N) and then each subsequent one just finds a match of L-1 in O(1) time using the follow pointer. (the O(N) whole parse only works if you are just finding the longest length at each position; if you are doing the optimal parse where you find the lowest offset for each length it's O(N^2))
Unfortunately, it seems that when you introduce the follow pointer this is when the code for the suffix trie gets rather tricky. It goes from 50 lines of code to 500 lines of code, and it's hard to do without introducing parent pointers and lots more tree maintenance. It also makes it way harder to do a sliding window.