How to break a Python list into a list of sublists with a given batch size
In Python, if you want to slice a list into several smaller sublists or slices, this blog post shows you many ways to do it.
- Problem: Split a sample list into sublists with a batch size of 4 items in each sublist
- Method 1: Create List of Tuples Using
itertools.batched
- Method 2: Create List of Lists Using
itertools.batched
- Method 3: Create Sublists Using List Comprehension and Slicing
- Method 4: Create Sublists Using For Loop and
list.append()
- Method 5: Create Sublists Using numpy
- âČ Benchmarking All These 5 Methods
- Conclusion
Problem: Split a sample list into sublists with a batch size of 4 items in each sublist
Let us create a list nums
with 17 random numbers from 1 to 100. We will split this list into sublists with batch size batch_size
.
import random
nums = [random.randint(1, 100) for _ in range(17)]
print(nums)
We will get an output similar to this, but probably not the same numbers.
[91, 84, 83, 83, 22, 94, 25, 72, 85, 12, 36, 69, 45, 96, 62, 51, 46]
We will declare batch_size
with value 4.
batch_size = 4
We will create a new variable chunks
that contains a list of sublists with batch size batch_size
.
Method 1: Create List of Tuples Using itertools.batched
Python 3.12 introduced the batched()
function in the very useful itertools
module. You can read more about it.
itertools.batched()
is an iterator that accepts a list and batch size as parameters and returns an itertools.batched object, which is an iterator containing a list of tuples. We will convert the iterator to a list by calling the list()
function over it.
import random
from itertools import batched
nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4
chunks = list(batched(nums, batch_size))
print('List:', nums)
print('Chunks:', chunks)
Output:
List: [39, 52, 13, 3, 61, 14, 42, 3, 76, 17, 8, 6, 9, 46, 6, 4, 73] Chunks: [(39, 52, 13, 3), (61, 14, 42, 3), (76, 17, 8, 6), (9, 46, 6, 4), (73,)]
Each sublist is a tuple and not a list. If you want a list of lists, you can use Method 2, which convert each sublist / tuple to a list using list comprehension.
Method 2: Create List of Lists Using itertools.batched
In this method, we will convert each chunk from tuple to list using list comprehension.
import random
from itertools import batched
nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4
chunks = [list(chunk) for chunk in batched(nums, batch_size)]
print('List:', nums)
print('Chunks:', chunks)
Output:
List: [67, 24, 22, 53, 86, 19, 28, 24, 92, 42, 11, 83, 32, 99, 47, 94, 70] Chunks: [[67, 24, 22, 53], [86, 19, 28, 24], [92, 42, 11, 83], [32, 99, 47, 94], [70]]
Method 3: Create Sublists Using List Comprehension and Slicing
In this method, we will use list comprehension and slice 4 items at a time.
import random
nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4
chunks = [nums[i:i+batch_size] for i in range(0, len(nums), batch_size)]
print('List:', nums)
print('Chunks:', chunks)
Output:
List: [24, 26, 17, 24, 35, 95, 37, 39, 58, 67, 52, 9, 28, 54, 9, 67, 10] Chunks: [[24, 26, 17, 24], [35, 95, 37, 39], [58, 67, 52, 9], [28, 54, 9, 67], [10]]
Method 4: Create Sublists Using For Loop and list.append()
In this method, we will use use a regular for
loop, create a slice with 4 items and append each slice to a new list.
import random
nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4
chunks = []
for i in range(0, len(nums), batch_size):
chunks.append(nums[i:i+batch_size])
print('List:', nums)
print('Chunks:', chunks)
Output:
List: [85, 17, 51, 39, 68, 19, 11, 5, 88, 34, 10, 28, 43, 2, 93, 39, 14] Chunks: [[85, 17, 51, 39], [68, 19, 11, 5], [88, 34, 10, 28], [43, 2, 93, 39], [14]]
Method 5: Create Sublists Using numpy
In this method, we will use the numpy
module and call the numpy.array_split()
method to give us an iterator for each chunk. We will then apply the list()
function for each chunk and we end up with a list of chunks, each having 4 items.
import random
import numpy as np
nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4
chunks = [list(x) for x in np.array_split(nums, batch_size)]
print('List:', nums)
print('Chunks:', chunks)
Output:
List: [48, 19, 8, 5, 8, 89, 18, 58, 91, 47, 6, 56, 61, 79, 32, 52, 98] Chunks: [[48, 19, 8, 5, 8], [89, 18, 58, 91], [47, 6, 56, 61], [79, 32, 52, 98]]
âČ Benchmarking All These 5 Methods
- For our benchmarking, we will use
time.perf_counter()
. - We will create list
nums
with random 10 million numbers from 1 through 1,000. - We will use a larger batch size of 100 items per sublist.
- We will call
time.perf_counter()
at the start and end of the program and find the difference.
Method 1: Benchmarking
import random
from itertools import batched
from time import perf_counter
start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100
nums = [random.randint(1, i_max) for _ in range(n)]
chunks = list(batched(nums, batch_size))
end_time = perf_counter()
print('Method 1 # Time taken:', end_time - start_time, 'milliseconds')
Output:
Method 1 # Time taken: 5.685045983991586 milliseconds
Method 2: Benchmarking
import random
from itertools import batched
from time import perf_counter
start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100
nums = [random.randint(1, i_max) for _ in range(n)]
chunks = [list(chunk) for chunk in batched(nums, batch_size)]
end_time = perf_counter()
print('Method 2 # Time taken:', end_time - start_time, 'milliseconds')
Output:
Method 2 # Time taken: 6.131341928004986 milliseconds
Method 3: Benchmarking
import random
from time import perf_counter
start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100
nums = [random.randint(1, i_max) for _ in range(n)]
chunks = [nums[i:i+batch_size] for i in range(0, len(nums), batch_size)]
end_time = perf_counter()
print('Method 3 # Time taken:', end_time - start_time, 'milliseconds')
Output:
Method 3 # Time taken: 6.003923358017346 milliseconds
Method 4: Benchmarking
import random
from time import perf_counter
start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100
nums = [random.randint(1, i_max) for _ in range(n)]
chunks = []
for i in range(0, len(nums), batch_size):
chunks.append(nums[i:i+batch_size])
end_time = perf_counter()
print('Method 4 # Time taken:', end_time - start_time, 'milliseconds')
Output:
Method 4 # Time taken: 5.763842954998836 milliseconds
Method 5: Benchmarking
import random
import numpy as np
from time import perf_counter
start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100
nums = [random.randint(1, i_max) for _ in range(n)]
chunks = [list(x) for x in np.array_split(nums, batch_size)]
end_time = perf_counter()
print('Method 5 # Time taken:', end_time - start_time, 'milliseconds')
Output:
Method 5 # Time taken: 6.788155929010827 milliseconds
Conclusion
This was a close competition and I ran each benchmarking several times. The results in this blog post are those of the final run. Your results may vary.
The fastest appears to be Method 1, which is itertools.batched()
that returns a list of tuples.
The slowest appears to be Method 5, which is breaking the list into chunks using numpy
.
If you have any input to add, please do so in the comments, or you can send me an email. Thanks for reading.
Related Posts
If you have any questions, please contact me at arulbOsutkNiqlzziyties@gNqmaizl.bkcom. You can also post questions in our Facebook group. Thank you.