The code (which is a few percent faster than quicksort) is on the last page:
procedure QUICKSORTAGENT(input state)
2: Let i = 1, j = 2, l = 3, h = 4
3: if FunctionID = None then
4: return vh ← Function1(vl ← vl, vh ← vh)
5: else if FunctionID = 1 then . QuickSort
6: if vl < vh then
7: if prev = None then
8: return vi ← Function2(vl ← vl, vh ← vh)
9: else if prev = (vi ← Function2(vl ← vl, vh ← vh)) then
10: return AssignVar(j, i)
11: else if prev = AssignVar(j, i) then
12: return MoveVar(i, -1)
13: else if prev = MoveVar(i, -1) then
14: if vi > vl then
15: return vi ← Function1(vl ← vl, vh ← vi)
16: else
17: return MoveVar(j, +1)
18: end if
19: else if prev = (vi ← Function1(vl ← vl, vh ← vi)) t hen
20: return MoveVar(j, +1)
21: else if prev = MoveVar(j, +1) and vj < vh then
22: return vh ← Function1(vl ← vj , vh ← vh)
23: else
24: return Return(h)
25: end if
26: else
27: return Return(h)
28: end if
29: else . Function ID = 2, Partition
30: if prev = None then
31: return AssignVar(i, l)
32: else if prev = AssignVar(i, l) then
33: return AssignVar(j, l)
34: else if vj < vh then
35: if prev = Swap(i, j) then
36: return MoveVar(i, +1)
37: else if (prev = AssignVar(j, l) or prev = MoveVar(j,
+1)) and A[vj ] < A[vh] then
38: if vi 6= vj then
39: return Swap(i, j)
40: else
41: return MoveVar(i, +1)
42: end if
43: else
44: return MoveVar(j, +1)
45: end if
46: else if prev = MoveVar(j, +1) then
47: return Swap(i, h)
48: else
49: return Return(i)
50: end if
51: end if
52: end procedure
The Electron had a new enough version of BBC Basic that "proper" procedures and UDFs were supported (in a limited fashion, but well enough considering the constraints of the machine) so you could pretty much ignore line numbers, in fact by using a text editor and TYPEing the result into BASIC you could write code without them (the interpreter needed them, the TYPE trick made it put them in for itself).
But yes, this was uncommon so line numbers were a major thing.
And indentation was optional, you could add extra spaces to the start of lines and they would be kept and would have no effect on execution, but you would be wasting precious bytes of RAM and if using one of the heavier display modes (0, 1 or 2) you would only have 8.5Kbyte to play with for all the code and run-time storage (your variables, BASIC's call stack).
No, that is the training agent implemented as an instance of quicksort. I don't know why the OP reports that it's "a few percent faster than quicksort" because I can't find where that is claimed in the paper.
In fact, as far as I understand it the paper claims that the learned model (i.e. the student agent) has learned to reproduce the behaviour of this teacher agent after training for a smaller number of steps on average than what the teacher needs to sort a list. That is what the entire claim about superior efficiency rests on (there is no example of a learned model and no asymptotic analysis of the program such a model represents).
As I understand it they designed quicksort like that to be able to train with it. It is quite clear from the video where it is called "quick sort agent" compared to the model one and function1 and function2 is in the stack trace.
"We found adding the previously executed action to the input state st is sufficient to handle dis-ambiguation for this quick sort implementation. Alg. 8 shows the converted quick sort scripted agent"
Right! I was hoping to see the body of one of their system induced algorithms. They don't seem to have included any - or maybe I am mistaken and their system is opaque and does not allow a generated algo. to be read out...