I spent the last few weeks working on a “Hunter Killer” tank type for my game. The behavior that I want for this tank type is to be constantly, agressively, tracking the player’s tank wherever it goes. I already have a working AStar pathfinding implementation in GDScript. But even with that in place, I found it surprisingly challenging to implement my desired behavior because of all of the regular and corner cases that can happen in my game. One of these corner cases is this:
The Hunter Killer (HK) tank is the dark grey tank while the blue tank is the player’s tank. There is no path from the HK tank to the player because of the trees and the border of the map. So what should it do? Well, it should take the player’s current tile and find the closest tile that it has a path to. For example:
The function that I wrote to accomplish this uses GDScript’s PoolVector2Array
as a Queue. But then I started thinking that Queues are usually written as linked lists. I wondered how much faster a linked list would perform versus GDScript’s built in PoolVector2Array
, or its generic Array
for that matter. And that’s when I started this side quest rabbit hole.
What is a Queue?
It might be useful to do a quick recap on what and how linked list Queue’s work. If you’re already familiar with this and only want the benchmarks, you can skip this and jump to them below.
A Queue is a data structure where the first element that goes in is also the first element that goes out. Think of it like the line at your grocery store. Traditionally, when an element is added to the Queue it is called enqueue
, and when it is removed it is called deque
.
You can use GDScript’s Array
to create a queue. You enqueue
using Array.append()
and deque
using Array.pop_front()
. As long as you stick to these two methods when accessing your data, then you have a Queue using an Array.
What is a linked list?
So what is a linked list and why bother with it if we can already make a Queue using an Array
? A linked list is a data structure where each element is an object with a value
property that holds the value you want to store, and a next
property that is a pointer to another element. In the Array
example, an element in the Array
is just the value that you want to store. It is not wrapped in a special object like it is in a linked list. Additionally, a linked list will have two pointers. A head
pointer that points to the front of the Queue, the “head element”. And then a tail
pointer that points to the back of the Queue, the “tail element”.
To add to a linked list, first you have to instance a new element and set its value
to the new value. Then using the tail
pointer access the current tail element, point its next
property to the new element. Then finally, point the tail
pointer to the new element. That’s an enqueue
operation in a linked list.
To deque
, first copy the head
pointer to a new variable so you don’t lose the current head element. Then set the head
pointer to the current head element’s next
property. This will make it point to the next element in the Queue. Finally return the current head element’s value.
It might seem like it is a lot more steps to achieve the same thing as using a GDScript Array
, but that’s only because append()
and pop_front()
abstract away all the steps from you. Under the hood, your computer will do a bunch of steps for you when you use a GDScript Array
. This is especially true for the pop_front()
method where the number of steps grows depending on how many elements you have in your Queue. Even the official GDScript documentation warns:
On large arrays, this method is much slower than pop_back as it will reindex all the array’s elements every time it’s called. The larger the array, the slower pop_front will be.
On the other hand, for linked lists, the steps are just changing pointers, modern computers can do this very quickly. There is no need to memcpy()
anything. It also doesn’t matter how many elements you have in your linked list Queue. You’re still just changing the head
and tail
pointers. The number of steps stay the same. This is why theoretically it is faster to use a linked list for a Queue than an Array, especially for very large Queues.
Linked List Queue Implementation
With all of that said, here’s my implementation of a linked list Queue in GDScript. I’ll break it down piece by piece below.
extends Reference
class_name LLQ
The first two lines declare the class name “LLQ” and how it inherits from a Reference
as its base class.
class Element:
extends Reference
var value
var next: Reference
func _init(v) -> void:
value = v
next = null
The next lines declare the Element class which are the objects that are in the Queue. Each element has a value
which can be any type, and a next
which is a pointer to another element. On initialization it sets the passed in v
to value
and next
to null
.
var name: String = "LLQ"
var length: int
var _head: Element
var _tail: Element
func _init() -> void:
_head = null
_tail = null
length = 0
The lines after that cover the initialization of the Queue itself. I gave it a name
property to make it easier to identify in benchmarks. But the important properties are length
to keep track of the length of the Queue, _head
which is a pointer to an Element
and _tail
which is also a pointer to an Element
.
func enqueue(v) -> void:
var elem: Element = Element.new(v)
length += 1
# Check if the Queue is empty
if _tail == null:
_tail = elem
_head = elem
return
# Set the current tail's next to the new element
_tail.next = elem
# Set the tail pointer to the new element
_tail = elem
Then we get to the enqueue()
method. This does the work described above to add to the Queue. It also includes a handful of lines to update the length
and handle the situation when the Queue is empty.
func deque():
# If empty, just return null
if _head == null:
return null
length -= 1
# Save the current head pointer
var head: Element = _head
# Set the head pointer to its next
_head = _head.next
# Check if this is the last element in the Queue, and set the tail
# to null if so.
if _head == null:
_tail = null
return head.value
The deque()
method also works as described previously with a few lines to simply return null
if the Queue is empty. It also sets the tail to null
if there is only one element in the Queue.
func dump() -> void:
while _head != null:
_head = _head.next
Finally, I included a dump
method to walk the entire linked list and free every element. This is unfortunately necessary with very large linked lists because of a bug in Godot. More on that later.
Benchmarking Methodology
I wrote two kinds of benchmarks to measure this linked list Queue’s performance versus an Array
or PoolVector2Array
Queue.
Performance On The Margin
The first one below tests the time it takes to add or remove elements on the margin. Specifically, if the Queue already has n elements, how long will it take to enqueue
another 1000 or deque
1000?
func enqueue(q, number: int) -> void:
for n in number:
q.enqueue(Vector2(n, n))
func deque(q, number: int) -> void:
for n in number:
q.deque()
func bench_margin(sizes: Array) -> void:
print("\nBench Margin")
print("\n\tEnqueue")
var fn: FuncRef = funcref(self, "enqueue")
for s in sizes:
for q in [LLQ.new(), PVA.new(), Arr.new()]:
enqueue(q, s)
assert(q.length == s)
print("\t%s %s %d" % [q.name, s, time_it(fn, q, 1000)])
assert(q.length == s + 1000)
q.dump()
print("\n\tDeque")
fn = funcref(self, "deque")
for s in sizes:
for q in [LLQ.new(), PVA.new(), Arr.new()]:
enqueue(q, s)
assert(q.length == s)
print("\t%s %s %d" % [q.name, s, time_it(fn, q, 1000)])
assert(q.length == s - 1000)
q.dump()
In order to simplify things, I wrote two additional classes that share the same interface as my linked list Queue class. An Arr
class for Arrays and a PVA
class for PoolVector2Arrays. You can examine their code here and here respectively.
I also wrote a simple time_it
function to measure the amount of time it would take to execute a function fn
.
func time_it(fn: FuncRef, q, arg: int) -> int:
var start: int = OS.get_ticks_msec()
fn.call_func(q, arg)
return (OS.get_ticks_msec() - start)
Simulated Search
The second benchmark I wrote measures a Queue’s perfomance with a workload very similar to what I need it to do to solve my problem in the game. In my game, I need it to help me find the closest tile to the player tile that has a path.
func simulate_search(q, number: int) -> void:
q.enqueue(Vector2(0, 0))
for v in number:
var _d = q.deque()
for i in 4:
q.enqueue(Vector2(v,v))
func bench_search(sizes: Array) -> void:
print("\nBench Search")
var fn: FuncRef = funcref(self, "simulate_search")
for n in sizes:
for q in [LLQ.new(), PVA.new(), Arr.new()]:
print("\t%s %s %s" % [q.name, n, time_it(fn, q, n)])
q.dump()
You can study the benchmarking code in full here.
Results
The tables below were created by this print statement:
print("\t%s %s %s" % [q.name, n, time_it(fn, q, n)])
Therefore the first column was the type of Queue, LLQ = linked list queue, PVA = PoolVector2Array
queue, and ARR = Array
queue. The second column was the size of the n
input, while third column was the time in milliseconds it took to perform the operation in the benchmark.
bench_margin
For enqueueing on the margin, all three data structures performed the operation in constant time.
Bench Margin
Enqueue
LLQ 1000 1
PVA 1000 0
ARR 1000 0
LLQ 10000 2
PVA 10000 0
ARR 10000 1
LLQ 100000 2
PVA 100000 0
ARR 100000 0
LLQ 1000000 3
PVA 1000000 1
ARR 1000000 0
This means that it doesn’t matter how many elements are already in the data structure, it takes the same amount of time to add another thousand. This makes sense for linked lists as discussed previously, but it also makes sense for GDScript Array
s and PoolVector2Array
s assuming that the implementation underneath those is some kind of dynamic array. For dynamic array’s, given that the underlying static array has enough capacity, adding another element to it should take constant time as well. If you’re curious about dynamic arrays and how those work you can check out the wiki. Another interesting thing to note with the results above is that the linked list Queue actually performed the slowest by up to 2 milliseconds at every size. This is likely due to the fact that instantiating an element of the LLQ, the Element
class, has cost while the Array
and PoolVector2Array
don’t have to pay that cost when enqueueing.
However, for dequeing on the margin, my benchmark showed a 10x growth in line with the growth of the number of elements already in the Array
.
Bench Margin
Deque
ARR 1000 2
ARR 10000 21
ARR 100000 259
ARR 1000000 2424
These results make sense given the aformentioned warning from the official GDScript documentation for Array
’s pop_front()
method. The larger the array, the slower pop_front()
will be.
The result was similarly linear for PoolVector2Arrays
albeit with a smaller slope.
Bench Margin
Deque
PVA 1000 0
PVA 10000 3
PVA 100000 22
PVA 1000000 217
This means that even though PoolVector2Array
s are much more efficient as Queues for Vector2
s, they are still affected by the number of elements in them, and will become very slow as that number grows.
For my linked list implementation however, it didn’t matter how many elements are already in the Queue, the number of milliseconds to add another thousand didn’t grow in relation to it.
Bench Margin
Deque
LLQ 1000 1
LLQ 10000 1
LLQ 100000 1
LLQ 1000000 2
This is as expected given that even with 1 million elements already in, the number of steps required to deque
an item remains the same.
Now that we’ve verified that these three perform enqueueing and dequeing as we expected. Lets see how they actually perform in my real-world use case as mocked up by my simulated_search
function.
bench_search
At 1000 operations, the linked list Queue was actually the slowest of the three. Three times slower than just using a PoolVector2Array
and a millisecond slower than an Array
.
Bench Search
LLQ 1000 6
PVA 1000 2
ARR 1000 5
At 10,000 operations, the linked list is only 36% slower than the PoolVector2Array
while the time it takes for the Array
to do it begins to jump.
Bench Search
LLQ 10000 64
PVA 10000 47
ARR 10000 404
At 100,000 operations, the PoolVector2Array
finally loses to the linked list, taking almost three and a half seconds to complete the task, while the linked list completed it in less than a second, or 77% faster. The Array
on the other (third) hand, took over 35 seconds. This performance would be unacceptable even in a turn-based game.
Bench Search
LLQ 100000 807
PVA 100000 3457
ARR 100000 35760
Finally at 1 million operations, the PoolVector2Array
is almost 100% slower than the linked list Queue. The Array
took a staggering 63 minutes to complete the task.
Bench Search
LLQ 1000000 11192
PVA 1000000 341300
ARR 1000000 3810035
A Bug 🐛
While benchmarking the linked list Queue, I kept running into an issue where Godot would segfault whenever I ran the benchmark.
It only happened when the amount of elements reached a certain size and when the linked list Queue was derefrenced. After some experimentation, I discoverd that the bug would stop happening if I walked the entire list and dereferenced every element before dereferencing the linked list Queue itself. This is what the dump
function does.
func dump() -> void:
while _head != null:
_head = _head.next
I also wrote a function to trigger the bug on demand.
func trigger_bug(n: int) -> void:
var q: LLQ = LLQ.new()
for v in n:
q.enqueue(Vector2(v, v))
# Uncomment line below to stop the bug from happening
#q.dump()
print("1, 2, 3 bug!")
It basically takes an integer and creates a linked list Queue of that size, and then returns the function. This causes the Queue to be dereferenced which causes the bug. I found that it happens around 27,550 elements on my desktop, but around 35,000 elements on my old 2016 Macbook Pro. It would be very interesting to go deep into the Godot source code someday and study it. But this benchmarking side quest already took too much time away from regular game development. I unfortunately can’t afford to go deeper into the rabbit hole because I need to finish my game.
Conclusion
So what’s my conclusion after going down this rabbit hole? While linked list Queues are cool and always fun to implement in a new language, for my specific use case, I’ll just stick to using a PoolVector2Array
as a Queue. The map size of my game only has 40 tiles across and 22 tiles down. This means that even in the absolute worst case scenario, the maximum number of operations to search for an open tile will be 880. The benchmarks above show that for 1000 operations the best performer is a PoolVector2Array
so that’s what I’ll use.
Perhaps the more meta conclusion is that even if a data structure is theoretically faster, it is still wise to benchmark performance for your specific use case. This is especially true when the size of your data is very small because a lot of times a simple array for that will be fine.