I spent the last few weeks working on a “Hunter Killer” tank type for my game. The behavior that I want for this tank type is to be constantly, agressively, tracking the player’s tank wherever it goes. I already have a working AStar pathfinding implementation in GDScript. But even with that in place, I found it surprisingly challenging to implement my desired behavior because of all of the regular and corner cases that can happen in my game. One of these corner cases is this:

No Path To Player

The Hunter Killer (HK) tank is the dark grey tank while the blue tank is the player’s tank. There is no path from the HK tank to the player because of the trees and the border of the map. So what should it do? Well, it should take the player’s current tile and find the closest tile that it has a path to. For example:

Path To Closes Tile

The function that I wrote to accomplish this uses GDScript’s PoolVector2Array as a Queue. But then I started thinking that Queues are usually written as linked lists. I wondered how much faster a linked list would perform versus GDScript’s built in PoolVector2Array, or its generic Array for that matter. And that’s when I started this side quest rabbit hole.

Rabbit Hole

What is a Queue?

It might be useful to do a quick recap on what and how linked list Queue’s work. If you’re already familiar with this and only want the benchmarks, you can skip this and jump to them below.

A Queue is a data structure where the first element that goes in is also the first element that goes out. Think of it like the line at your grocery store. Traditionally, when an element is added to the Queue it is called enqueue, and when it is removed it is called deque.

Queue Circles With Enqueue and Deque

You can use GDScript’s Array to create a queue. You enqueue using Array.append() and deque using Array.pop_front(). As long as you stick to these two methods when accessing your data, then you have a Queue using an Array.

Queue Circles With append() and pop_front()

What is a linked list?

So what is a linked list and why bother with it if we can already make a Queue using an Array? A linked list is a data structure where each element is an object with a value property that holds the value you want to store, and a next property that is a pointer to another element. In the Array example, an element in the Array is just the value that you want to store. It is not wrapped in a special object like it is in a linked list. Additionally, a linked list will have two pointers. A head pointer that points to the front of the Queue, the “head element”. And then a tail pointer that points to the back of the Queue, the “tail element”.

Linked List

To add to a linked list, first you have to instance a new element and set its value to the new value. Then using the tail pointer access the current tail element, point its next property to the new element. Then finally, point the tail pointer to the new element. That’s an enqueue operation in a linked list.

Linked List

To deque, first copy the head pointer to a new variable so you don’t lose the current head element. Then set the head pointer to the current head element’s next property. This will make it point to the next element in the Queue. Finally return the current head element’s value.

Linked List

It might seem like it is a lot more steps to achieve the same thing as using a GDScript Array, but that’s only because append() and pop_front() abstract away all the steps from you. Under the hood, your computer will do a bunch of steps for you when you use a GDScript Array. This is especially true for the pop_front() method where the number of steps grows depending on how many elements you have in your Queue. Even the official GDScript documentation warns:

On large arrays, this method is much slower than pop_back as it will reindex all the array’s elements every time it’s called. The larger the array, the slower pop_front will be.

On the other hand, for linked lists, the steps are just changing pointers, modern computers can do this very quickly. There is no need to memcpy() anything. It also doesn’t matter how many elements you have in your linked list Queue. You’re still just changing the head and tail pointers. The number of steps stay the same. This is why theoretically it is faster to use a linked list for a Queue than an Array, especially for very large Queues.

Linked List Queue Implementation

With all of that said, here’s my implementation of a linked list Queue in GDScript. I’ll break it down piece by piece below.

extends Reference
class_name LLQ

The first two lines declare the class name “LLQ” and how it inherits from a Reference as its base class.

class Element:
	extends Reference

	var value
	var next: Reference

	func _init(v) -> void:
		value = v
		next = null

The next lines declare the Element class which are the objects that are in the Queue. Each element has a value which can be any type, and a next which is a pointer to another element. On initialization it sets the passed in v to value and next to null.

var name: String = "LLQ"
var length: int
var _head: Element
var _tail: Element

func _init() -> void:
	_head = null
	_tail = null
	length = 0

The lines after that cover the initialization of the Queue itself. I gave it a name property to make it easier to identify in benchmarks. But the important properties are length to keep track of the length of the Queue, _head which is a pointer to an Element and _tail which is also a pointer to an Element.

func enqueue(v) -> void:
	var elem: Element = Element.new(v)
	length += 1
	# Check if the Queue is empty
	if _tail == null:
		_tail = elem
		_head = elem
		return

	# Set the current tail's next to the new element
	_tail.next = elem
	# Set the tail pointer to the new element
	_tail = elem

Then we get to the enqueue() method. This does the work described above to add to the Queue. It also includes a handful of lines to update the length and handle the situation when the Queue is empty.

func deque():
	# If empty, just return null
	if _head == null:
		return null

	length -= 1
	# Save the current head pointer
	var head: Element = _head
	# Set the head pointer to its next
	_head = _head.next
	# Check if this is the last element in the Queue, and set the tail
	# to null if so.
	if _head == null:
		_tail = null

	return head.value

The deque() method also works as described previously with a few lines to simply return null if the Queue is empty. It also sets the tail to null if there is only one element in the Queue.

func dump() -> void:
	while _head != null:
		_head = _head.next

Finally, I included a dump method to walk the entire linked list and free every element. This is unfortunately necessary with very large linked lists because of a bug in Godot. More on that later.

Benchmarking Methodology

I wrote two kinds of benchmarks to measure this linked list Queue’s performance versus an Array or PoolVector2Array Queue.

Performance On The Margin

The first one below tests the time it takes to add or remove elements on the margin. Specifically, if the Queue already has n elements, how long will it take to enqueue another 1000 or deque 1000?

func enqueue(q, number: int) -> void:
	for n in number:
		q.enqueue(Vector2(n, n))

func deque(q, number: int) -> void:
	for n in number:
		q.deque()

func bench_margin(sizes: Array) -> void:
	print("\nBench Margin")
	print("\n\tEnqueue")
	var fn: FuncRef = funcref(self, "enqueue")
	for s in sizes:
		for q in [LLQ.new(), PVA.new(), Arr.new()]:
			enqueue(q, s)
			assert(q.length == s)
			print("\t%s %s %d" % [q.name, s, time_it(fn, q, 1000)])
			assert(q.length == s + 1000)
			q.dump()

	print("\n\tDeque")
	fn = funcref(self, "deque")
	for s in sizes:
		for q in [LLQ.new(), PVA.new(), Arr.new()]:
			enqueue(q, s)
			assert(q.length == s)
			print("\t%s %s %d" % [q.name, s, time_it(fn, q, 1000)])
			assert(q.length == s - 1000)
			q.dump()

In order to simplify things, I wrote two additional classes that share the same interface as my linked list Queue class. An Arr class for Arrays and a PVA class for PoolVector2Arrays. You can examine their code here and here respectively.

I also wrote a simple time_it function to measure the amount of time it would take to execute a function fn.

func time_it(fn: FuncRef, q, arg: int) -> int:
	var start: int = OS.get_ticks_msec()
	fn.call_func(q, arg)
	return (OS.get_ticks_msec() - start)

The second benchmark I wrote measures a Queue’s perfomance with a workload very similar to what I need it to do to solve my problem in the game. In my game, I need it to help me find the closest tile to the player tile that has a path.

func simulate_search(q, number: int) -> void:
	q.enqueue(Vector2(0, 0))
	for v in number:
		var _d = q.deque()
		for i in 4:
			q.enqueue(Vector2(v,v))

func bench_search(sizes: Array) -> void:
	print("\nBench Search")
	var fn: FuncRef = funcref(self, "simulate_search")
	for n in sizes:
		for q in [LLQ.new(), PVA.new(), Arr.new()]:
			print("\t%s %s %s" % [q.name, n, time_it(fn, q, n)])
			q.dump()

You can study the benchmarking code in full here.

Results

The tables below were created by this print statement:

print("\t%s %s %s" % [q.name, n, time_it(fn, q, n)])

Therefore the first column was the type of Queue, LLQ = linked list queue, PVA = PoolVector2Array queue, and ARR = Array queue. The second column was the size of the n input, while third column was the time in milliseconds it took to perform the operation in the benchmark.

bench_margin

For enqueueing on the margin, all three data structures performed the operation in constant time.

Bench Margin

        Enqueue
        LLQ    1000       1
        PVA    1000       0
        ARR    1000       0
        LLQ   10000       2
        PVA   10000       0
        ARR   10000       1
        LLQ  100000       2
        PVA  100000       0
        ARR  100000       0
        LLQ 1000000       3
        PVA 1000000       1
        ARR 1000000       0

This means that it doesn’t matter how many elements are already in the data structure, it takes the same amount of time to add another thousand. This makes sense for linked lists as discussed previously, but it also makes sense for GDScript Arrays and PoolVector2Arrays assuming that the implementation underneath those is some kind of dynamic array. For dynamic array’s, given that the underlying static array has enough capacity, adding another element to it should take constant time as well. If you’re curious about dynamic arrays and how those work you can check out the wiki. Another interesting thing to note with the results above is that the linked list Queue actually performed the slowest by up to 2 milliseconds at every size. This is likely due to the fact that instantiating an element of the LLQ, the Element class, has cost while the Array and PoolVector2Array don’t have to pay that cost when enqueueing.

However, for dequeing on the margin, my benchmark showed a 10x growth in line with the growth of the number of elements already in the Array.

Bench Margin

        Deque
        ARR    1000       2
        ARR   10000      21
        ARR  100000     259
        ARR 1000000    2424

These results make sense given the aformentioned warning from the official GDScript documentation for Array’s pop_front() method. The larger the array, the slower pop_front() will be.

The result was similarly linear for PoolVector2Arrays albeit with a smaller slope.

Bench Margin

        Deque
        PVA    1000       0
        PVA   10000       3
        PVA  100000      22
        PVA 1000000     217

This means that even though PoolVector2Arrays are much more efficient as Queues for Vector2s, they are still affected by the number of elements in them, and will become very slow as that number grows.

For my linked list implementation however, it didn’t matter how many elements are already in the Queue, the number of milliseconds to add another thousand didn’t grow in relation to it.

Bench Margin

        Deque
        LLQ    1000       1
        LLQ   10000       1
        LLQ  100000       1
        LLQ 1000000       2

This is as expected given that even with 1 million elements already in, the number of steps required to deque an item remains the same.

Now that we’ve verified that these three perform enqueueing and dequeing as we expected. Lets see how they actually perform in my real-world use case as mocked up by my simulated_search function.

At 1000 operations, the linked list Queue was actually the slowest of the three. Three times slower than just using a PoolVector2Array and a millisecond slower than an Array.

Bench Search

        LLQ    1000       6
        PVA    1000       2
        ARR    1000       5

At 10,000 operations, the linked list is only 36% slower than the PoolVector2Array while the time it takes for the Array to do it begins to jump.

Bench Search

        LLQ   10000      64
        PVA   10000      47
        ARR   10000     404

At 100,000 operations, the PoolVector2Array finally loses to the linked list, taking almost three and a half seconds to complete the task, while the linked list completed it in less than a second, or 77% faster. The Array on the other (third) hand, took over 35 seconds. This performance would be unacceptable even in a turn-based game.

Bench Search

        LLQ  100000     807
        PVA  100000    3457
        ARR  100000   35760

Finally at 1 million operations, the PoolVector2Array is almost 100% slower than the linked list Queue. The Array took a staggering 63 minutes to complete the task.

Bench Search

        LLQ 1000000   11192
        PVA 1000000  341300
        ARR 1000000 3810035

A Bug 🐛

While benchmarking the linked list Queue, I kept running into an issue where Godot would segfault whenever I ran the benchmark.

Linked List Queue Segmentation Fault

It only happened when the amount of elements reached a certain size and when the linked list Queue was derefrenced. After some experimentation, I discoverd that the bug would stop happening if I walked the entire list and dereferenced every element before dereferencing the linked list Queue itself. This is what the dump function does.

func dump() -> void:
	while _head != null:
		_head = _head.next

I also wrote a function to trigger the bug on demand.

func trigger_bug(n: int) -> void:
	var q: LLQ = LLQ.new()
	for v in n:
		q.enqueue(Vector2(v, v))
	# Uncomment line below to stop the bug from happening
	#q.dump()
	print("1, 2, 3 bug!")

It basically takes an integer and creates a linked list Queue of that size, and then returns the function. This causes the Queue to be dereferenced which causes the bug. I found that it happens around 27,550 elements on my desktop, but around 35,000 elements on my old 2016 Macbook Pro. It would be very interesting to go deep into the Godot source code someday and study it. But this benchmarking side quest already took too much time away from regular game development. I unfortunately can’t afford to go deeper into the rabbit hole because I need to finish my game.

Conclusion

So what’s my conclusion after going down this rabbit hole? While linked list Queues are cool and always fun to implement in a new language, for my specific use case, I’ll just stick to using a PoolVector2Array as a Queue. The map size of my game only has 40 tiles across and 22 tiles down. This means that even in the absolute worst case scenario, the maximum number of operations to search for an open tile will be 880. The benchmarks above show that for 1000 operations the best performer is a PoolVector2Array so that’s what I’ll use.

Perhaps the more meta conclusion is that even if a data structure is theoretically faster, it is still wise to benchmark performance for your specific use case. This is especially true when the size of your data is very small because a lot of times a simple array for that will be fine.