This has always confused me. It seems like this would be nicer:
["Hello", "world"].join("-")
Than this:
"-".join(["Hello", "world"])
Is there a specific reason it is like this?
It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.
For example:
'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'
Using something other than strings will raise the following error:
TypeError: sequence item 0: expected str instance, int found
list.join(string)
appears more an object-oriented approach whereas string.join(list)
sounds much more procedural to me.
Jan 14, 2018 at 15:35
print(str.join('-', my_list))
and it works, feels better.
__iter__
method. Requiring all iterables to also implement join
would complicate a general interface (which also covers iterables over non-strings) for a very particular use case. Defining join
on strins side-steps this problem at the cost of the "unintuitive" order. A better choice might have been to keep it a function with the first argument being the iterable and the second (optional one) being the joiner string - but that ship has sailed.
Jun 11, 2018 at 6:08
This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join
was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str
methods including join
) was released in Oct 2000.
separator.join(items)
items.join(separator)
items.reduce(separator)
join
as a built-in functionlist
s and tuple
s, but all sequences/iterables.items.reduce(separator)
is difficult for newcomers.items.join(separator)
introduces unexpected dependency from sequences to str/unicode.join()
as a free-standing built-in function would support only specific data types. So using a built-in namespace is not good. If join()
were to support many data types, creating an optimized implementation would be difficult: if implemented using the __add__
method then it would be O(n²).separator
) should not be omitted. Explicit is better than implicit.Here are some additional thoughts (my own, and my friend's):
iterable
class (which is mentioned in another comment).Guido's decision is recorded in a historical mail, deciding on separator.join(items)
:
Funny, but it does seem right! Barry, go for it...
--Guido van Rossum
string.join(sep, seq)
or similar. 🤷♂️
str.join(",", ["a", "b", "c"])
returns "a,b,c"
.
Nov 1, 2023 at 10:33
I agree that it's counterintuitive at first, but there's a good reason. Join can't be a method of a list because:
There are actually two join methods (Python 3.0):
>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>
If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can't join byte and str together, so the way they have it now makes sense.
Why is it
string.join(list)
instead oflist.join(string)
?
This is because join
is a "string" method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren't lists?
What if you have a tuple of strings? If this were a list
method, you would have to cast every such iterator of strings as a list
before you could join the elements into a single string! For example:
some_strings = ('foo', 'bar', 'baz')
Let's roll our own list join method:
class OurList(list):
def join(self, s):
return s.join(self)
And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:
>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'
So we see we have to add an extra step to use our list method, instead of just using the builtin string method:
>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'
The algorithm Python uses to create the final string with str.join
actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.
Thus, while passing around generators is usually better than list comprehensions, str.join
is an exception:
>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173
Nevertheless, the str.join
operation is still semantically a "string" operation, so it still makes sense to have it on the str
object than on miscellaneous iterables.
Think of it as the natural orthogonal operation to split.
I understand why it is applicable to anything iterable and so can't easily be implemented just on list.
For readability, I'd like to see it in the language but I don't think that is actually feasible - if iterability were an interface then it could be added to the interface but it is just a convention and so there's no central way to add it to the set of things which are iterable.
-
in "-".join(my_list)
declares that you are converting to a string from joining elements a list.It's result-oriented. (just for easy memory and understanding)
I made an exhaustive cheatsheet of methods_of_string for your reference.
string_methods_44 = {
'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
'islower','istitle', 'isupper','isprintable', 'isspace', ],
'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
'encode': ['translate', 'maketrans', 'encode'],
'format': ['format', 'format_map']}
Primarily because the result of a someString.join()
is a string.
The sequence (list or tuple or whatever) doesn't appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.
The variables my_list
and "-"
are both objects. Specifically, they're instances of the classes list
and str
, respectively. The join
function belongs to the class str
. Therefore, the syntax "-".join(my_list)
is used because the object "-"
is taking my_list
as an input.
You can't only join lists and tuples. You can join almost any iterable. And iterables include generators, maps, filters etc
>>> '-'.join(chr(x) for x in range(48, 55))
'0-1-2-3-4-5-6'
>>> '-'.join(map(str, (1, 10, 100)))
'1-10-100'
And the beauty of using generators, maps, filters etc is that they cost little memory, and are created almost instantaneously.
Just another reason why it's conceptually:
str.join(<iterator>)
It's efficient only granting str this ability. Instead of granting join to all the iterators: list, tuple, set, dict, generator, map, filter all of which only have object as common parent.
Of course range(), and zip() are also iterators, but they will never return str so they cannot be used with str.join()
>>> '-'.join(range(48, 55))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found
iter.join()
)
I 100% agree with your issue. If we boil down all the answers and comments here, the explanation comes down to "historical reasons".
str.join
isn't just confusing or not-nice looking, it's impractical in real-world code. It defeats readable function or method chaining because the separator is rarely (ever?) the result of some previous computation. In my experience, it's always a constant, hard-coded value like ", "
.
I clean up my code — enabling reading it in one direction — using tools.functoolz
:
>>> from toolz.functoolz import curry, pipe
>>> join = curry(str.join)
>>>
>>> a = ["one", "two", "three"]
>>> pipe(
... a,
... join("; ")
>>> )
'one; two; three'
I'll have several other functions in the pipe as well. The result is that it reads easily in just one direction, from beginning to end as a chain of functions. Currying map
helps a lot.
-
declares that you are joining a list and converting to a string.It's result oriented.str.split()
returns a non-string and makes quite a bit of sense. It seems like the same logic should be ok here, right? (Just talking about the conceptual problem of a non-string output)