Very simply, what is tail-call optimization?
More specifically, what are some small code snippets where it could be applied, and where not, with an explanation of why?
Tail-call optimization is where you are able to avoid allocating a new stack frame for a function because the calling function will simply return the value that it gets from the called function. The most common use is tail-recursion, where a recursive function written to take advantage of tail-call optimization can use constant stack space.
Scheme is one of the few programming languages that guarantee in the spec that any implementation must provide this optimization, so here are two examples of the factorial function in Scheme:
(define (fact x)
(if (= x 0) 1
(* x (fact (- x 1)))))
(define (fact x)
(define (fact-tail x accum)
(if (= x 0) accum
(fact-tail (- x 1) (* x accum))))
(fact-tail x 1))
The first function is not tail recursive because when the recursive call is made, the function needs to keep track of the multiplication it needs to do with the result after the call returns. As such, the stack looks as follows:
(fact 3)
(* 3 (fact 2))
(* 3 (* 2 (fact 1)))
(* 3 (* 2 (* 1 (fact 0))))
(* 3 (* 2 (* 1 1)))
(* 3 (* 2 1))
(* 3 2)
6
In contrast, the stack trace for the tail recursive factorial looks as follows:
(fact 3)
(fact-tail 3 1)
(fact-tail 2 3)
(fact-tail 1 6)
(fact-tail 0 6)
6
As you can see, we only need to keep track of the same amount of data for every call to fact-tail because we are simply returning the value we get right through to the top. This means that even if I were to call (fact 1000000), I need only the same amount of space as (fact 3). This is not the case with the non-tail-recursive fact, and as such large values may cause a stack overflow.
Let's walk through a simple example: the factorial function implemented in C.
We start with the obvious recursive definition
unsigned fac(unsigned n)
{
if (n < 2) return 1;
return n * fac(n - 1);
}
A function ends with a tail call if the last operation before the function returns is another function call. If this call invokes the same function, it is tail-recursive.
Even though fac()
looks tail-recursive at first glance, it is not as what actually happens is
unsigned fac(unsigned n)
{
if (n < 2) return 1;
unsigned acc = fac(n - 1);
return n * acc;
}
ie the last operation is the multiplication and not the function call.
However, it's possible to rewrite fac()
to be tail-recursive by passing the accumulated value down the call chain as an additional argument and passing only the final result up again as the return value:
unsigned fac(unsigned n)
{
return fac_tailrec(1, n);
}
unsigned fac_tailrec(unsigned acc, unsigned n)
{
if (n < 2) return acc;
return fac_tailrec(n * acc, n - 1);
}
Now, why is this useful? Because we immediately return after the tail call, we can discard the previous stackframe before invoking the function in tail position, or, in case of recursive functions, reuse the stackframe as-is.
The tail-call optimization transforms our recursive code into
unsigned fac_tailrec(unsigned acc, unsigned n)
{
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n - 1;
goto TOP;
}
This can be inlined into fac()
and we arrive at
unsigned fac(unsigned n)
{
unsigned acc = 1;
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n - 1;
goto TOP;
}
which is equivalent to
unsigned fac(unsigned n)
{
unsigned acc = 1;
for (; n > 1; --n)
acc *= n;
return acc;
}
As we can see here, a sufficiently advanced optimizer can replace tail-recursion with iteration, which is far more efficient as you avoid function call overhead and only use a constant amount of stack space.
TCO (Tail Call Optimization) is the process by which a smart compiler can make a call to a function and take no additional stack space. The only situation in which this happens is if the last instruction executed in a function f is a call to a function g (Note: g can be f). The key here is that f no longer needs stack space - it simply calls g and then returns whatever g would return. In this case the optimization can be made that g just runs and returns whatever value it would have to the thing that called f.
This optimization can make recursive calls take constant stack space, rather than explode.
Example: this factorial function is not TCOptimizable:
from dis import dis
def fact(n):
if n == 0:
return 1
return n * fact(n-1)
dis(fact)
2 0 LOAD_FAST 0 (n)
2 LOAD_CONST 1 (0)
4 COMPARE_OP 2 (==)
6 POP_JUMP_IF_FALSE 12
3 8 LOAD_CONST 2 (1)
10 RETURN_VALUE
4 >> 12 LOAD_FAST 0 (n)
14 LOAD_GLOBAL 0 (fact)
16 LOAD_FAST 0 (n)
18 LOAD_CONST 2 (1)
20 BINARY_SUBTRACT
22 CALL_FUNCTION 1
24 BINARY_MULTIPLY
26 RETURN_VALUE
This function does things besides call another function in its return statement.
This below function is TCOptimizable:
def fact_h(n, acc):
if n == 0:
return acc
return fact_h(n-1, acc*n)
def fact(n):
return fact_h(n, 1)
dis(fact)
2 0 LOAD_GLOBAL 0 (fact_h)
2 LOAD_FAST 0 (n)
4 LOAD_CONST 1 (1)
6 CALL_FUNCTION 2
8 RETURN_VALUE
This is because the last thing to happen in any of these functions is to call another function.
Probably the best high level description I have found for tail calls, recursive tail calls and tail call optimization is the blog post
"What the heck is: A tail call"
by Dan Sugalski. On tail call optimization he writes:
Consider, for a moment, this simple function:
sub foo (int a) { a += 15; return bar(a); }
So, what can you, or rather your language compiler, do? Well, what it can do is turn code of the form
return somefunc();
into the low-level sequencepop stack frame; goto somefunc();
. In our example, that means before we callbar
,foo
cleans itself up and then, rather than callingbar
as a subroutine, we do a low-levelgoto
operation to the start ofbar
.Foo
's already cleaned itself out of the stack, so whenbar
starts it looks like whoever calledfoo
has really calledbar
, and whenbar
returns its value, it returns it directly to whoever calledfoo
, rather than returning it tofoo
which would then return it to its caller.
And on tail recursion:
Tail recursion happens if a function, as its last operation, returns the result of calling itself. Tail recursion is easier to deal with because rather than having to jump to the beginning of some random function somewhere, you just do a goto back to the beginning of yourself, which is a darned simple thing to do.
So that this:
sub foo (int a, int b) { if (b == 1) { return a; } else { return foo(a*a + a, b - 1); }
gets quietly turned into:
sub foo (int a, int b) { label: if (b == 1) { return a; } else { a = a*a + a; b = b - 1; goto label; }
What I like about this description is how succinct and easy it is to grasp for those coming from an imperative language background (C, C++, Java)
foo
function tail call optimized? It is only calling a function as its last step, and it is simply returning that value, right?
Oct 8, 2014 at 21:51
GCC C minimal runnable example with x86 disassembly analysis
Let's see how GCC can automatically do tail call optimizations for us by looking at the generated assembly.
This will serve as an extremely concrete example of what was mentioned in other answers such as https://stackoverflow.com/a/9814654/895245 that the optimization can convert recursive function calls to a loop.
This in turn saves memory and improves performance, since memory accesses are often the main thing that makes programs slow nowadays.
As an input, we give GCC a non-optimized naive stack based factorial:
tail_call.c
#include <stdio.h>
#include <stdlib.h>
unsigned factorial(unsigned n) {
if (n == 1) {
return 1;
}
return n * factorial(n - 1);
}
int main(int argc, char **argv) {
int input;
if (argc > 1) {
input = strtoul(argv[1], NULL, 0);
} else {
input = 5;
}
printf("%u\n", factorial(input));
return EXIT_SUCCESS;
}
Compile and disassemble:
gcc -O1 -foptimize-sibling-calls -ggdb3 -std=c99 -Wall -Wextra -Wpedantic \
-o tail_call.out tail_call.c
objdump -d tail_call.out
where -foptimize-sibling-calls
is the name of generalization of tail calls according to man gcc
:
-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.
as mentioned at: How do I check if gcc is performing tail-recursion optimization?
I choose -O1
because:
-O0
. I suspect that this is because there are required intermediate transformations missing.-O3
produces ungodly efficient code that would not be very educative, although it is also tail call optimized.Disassembly with -fno-optimize-sibling-calls
:
0000000000001145 <factorial>:
1145: 89 f8 mov %edi,%eax
1147: 83 ff 01 cmp $0x1,%edi
114a: 74 10 je 115c <factorial+0x17>
114c: 53 push %rbx
114d: 89 fb mov %edi,%ebx
114f: 8d 7f ff lea -0x1(%rdi),%edi
1152: e8 ee ff ff ff callq 1145 <factorial>
1157: 0f af c3 imul %ebx,%eax
115a: 5b pop %rbx
115b: c3 retq
115c: c3 retq
With -foptimize-sibling-calls
:
0000000000001145 <factorial>:
1145: b8 01 00 00 00 mov $0x1,%eax
114a: 83 ff 01 cmp $0x1,%edi
114d: 74 0e je 115d <factorial+0x18>
114f: 8d 57 ff lea -0x1(%rdi),%edx
1152: 0f af c7 imul %edi,%eax
1155: 89 d7 mov %edx,%edi
1157: 83 fa 01 cmp $0x1,%edx
115a: 75 f3 jne 114f <factorial+0xa>
115c: c3 retq
115d: 89 f8 mov %edi,%eax
115f: c3 retq
The key difference between the two is that:
the -fno-optimize-sibling-calls
uses callq
, which is the typical non-optimized function call.
This instruction pushes the return address to the stack, therefore increasing it.
Furthermore, this version also does push %rbx
, which pushes %rbx
to the stack.
GCC does this because it stores edi
, which is the first function argument (n
) into ebx
, then calls factorial
.
GCC needs to do this because it is preparing for another call to factorial
, which will use the new edi == n-1
.
It chooses ebx
because this register is callee-saved: What registers are preserved through a linux x86-64 function call so the subcall to factorial
won't change it and lose n
.
the -foptimize-sibling-calls
does not use any instructions that push to the stack: it only does goto
jumps within factorial
with the instructions je
and jne
.
Therefore, this version is equivalent to a while loop, without any function calls. Stack usage is constant.
Tested in Ubuntu 18.10, GCC 8.2.
Note first of all that not all languages support it.
TCO applys to a special case of recursion. The gist of it is, if the last thing you do in a function is call itself (e.g. it is calling itself from the "tail" position), this can be optimized by the compiler to act like iteration instead of standard recursion.
You see, normally during recursion, the runtime needs to keep track of all the recursive calls, so that when one returns it can resume at the previous call and so on. (Try manually writing out the result of a recursive call to get a visual idea of how this works.) Keeping track of all the calls takes up space, which gets significant when the function calls itself a lot. But with TCO, it can just say "go back to the beginning, only this time change the parameter values to these new ones." It can do that because nothing after the recursive call refers to those values.
foo
method tail call optimized?
Oct 8, 2014 at 21:52
Look here:
http://tratt.net/laurie/tech_articles/articles/tail_call_optimization
As you probably know, recursive function calls can wreak havoc on a stack; it is easy to quickly run out of stack space. Tail call optimization is way by which you can create a recursive style algorithm that uses constant stack space, therefore it does not grow and grow and you get stack errors.
The recursive function approach has a problem. It builds up a call stack of size O(n), which makes our total memory cost O(n). This makes it vulnerable to a stack overflow error, where the call stack gets too big and runs out of space.
Tail call optimization (TCO) scheme. Where it can optimize recursive functions to avoid building up a tall call stack and hence saves the memory cost.
There are many languages who are doing TCO like (JavaScript, Ruby and few C) whereas Python and Java do not do TCO.
JavaScript language has confirmed using :) http://2ality.com/2015/06/tail-call-optimization.html
We should ensure that there are no goto statements in the function itself .. taken care by function call being the last thing in the callee function.
Large scale recursions can use this for optimizations, but in small scale, the instruction overhead for making the function call a tail call reduces the actual purpose.
TCO might cause a forever running function:
void eternity()
{
eternity();
}
In a functional language, tail call optimization is as if a function call could return a partially evaluated expression as the result, which would then be evaluated by the caller.
f x = g x
f 6 reduces to g 6. So if the implementation could return g 6 as the result, and then call that expression it would save a stack frame.
Also
f x = if c x then g x else h x.
Reduces to f 6 to either g 6 or h 6. So if the implementation evaluates c 6 and finds it is true then it can reduce,
if true then g x else h x ---> g x
f x ---> h x
A simple non tail call optimization interpreter might look like this,
class simple_expresion
{
...
public:
virtual ximple_value *DoEvaluate() const = 0;
};
class simple_value
{
...
};
class simple_function : public simple_expresion
{
...
private:
simple_expresion *m_Function;
simple_expresion *m_Parameter;
public:
virtual simple_value *DoEvaluate() const
{
vector<simple_expresion *> parameterList;
parameterList->push_back(m_Parameter);
return m_Function->Call(parameterList);
}
};
class simple_if : public simple_function
{
private:
simple_expresion *m_Condition;
simple_expresion *m_Positive;
simple_expresion *m_Negative;
public:
simple_value *DoEvaluate() const
{
if (m_Condition.DoEvaluate()->IsTrue())
{
return m_Positive.DoEvaluate();
}
else
{
return m_Negative.DoEvaluate();
}
}
}
A tail call optimization interpreter might look like this,
class tco_expresion
{
...
public:
virtual tco_expresion *DoEvaluate() const = 0;
virtual bool IsValue()
{
return false;
}
};
class tco_value
{
...
public:
virtual bool IsValue()
{
return true;
}
};
class tco_function : public tco_expresion
{
...
private:
tco_expresion *m_Function;
tco_expresion *m_Parameter;
public:
virtual tco_expression *DoEvaluate() const
{
vector< tco_expression *> parameterList;
tco_expression *function = const_cast<SNI_Function *>(this);
while (!function->IsValue())
{
function = function->DoCall(parameterList);
}
return function;
}
tco_expresion *DoCall(vector<tco_expresion *> &p_ParameterList)
{
p_ParameterList.push_back(m_Parameter);
return m_Function;
}
};
class tco_if : public tco_function
{
private:
tco_expresion *m_Condition;
tco_expresion *m_Positive;
tco_expresion *m_Negative;
tco_expresion *DoEvaluate() const
{
if (m_Condition.DoEvaluate()->IsTrue())
{
return m_Positive;
}
else
{
return m_Negative;
}
}
}