How to Break a Python List into Sublists or Slices

Published August 27, 2024

How to break a Python list into a list of sublists with a given batch size How to break a Python list into a list of sublists with a given batch size

In Python, if you want to slice a list into several smaller sublists or slices, this blog post shows you many ways to do it.

Problem: Split a sample list into sublists with a batch size of 4 items in each sublist

Let us create a list nums with 17 random numbers from 1 to 100. We will split this list into sublists with batch size batch_size.

import random
nums = [random.randint(1, 100) for _ in range(17)]
print(nums)

We will get an output similar to this, but probably not the same numbers.

[91, 84, 83, 83, 22, 94, 25, 72, 85, 12, 36, 69, 45, 96, 62, 51, 46]

We will declare batch_size with value 4.

batch_size = 4

We will create a new variable chunks that contains a list of sublists with batch size batch_size.

Method 1: Create List of Tuples Using itertools.batched

Python 3.12 introduced the batched() function in the very useful itertools module. You can read more about it.

itertools.batched() is an iterator that accepts a list and batch size as parameters and returns an itertools.batched object, which is an iterator containing a list of tuples. We will convert the iterator to a list by calling the list() function over it.

import random
from itertools import batched

nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4

chunks = list(batched(nums, batch_size))

print('List:', nums)
print('Chunks:', chunks)

Output:

List: [39, 52, 13, 3, 61, 14, 42, 3, 76, 17, 8, 6, 9, 46, 6, 4, 73]
Chunks: [(39, 52, 13, 3), (61, 14, 42, 3), (76, 17, 8, 6), (9, 46, 6, 4), (73,)]

Each sublist is a tuple and not a list. If you want a list of lists, you can use Method 2, which convert each sublist / tuple to a list using list comprehension.

Method 2: Create List of Lists Using itertools.batched

In this method, we will convert each chunk from tuple to list using list comprehension.

import random
from itertools import batched

nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4

chunks = [list(chunk) for chunk in batched(nums, batch_size)]

print('List:', nums)
print('Chunks:', chunks)

Output:

List: [67, 24, 22, 53, 86, 19, 28, 24, 92, 42, 11, 83, 32, 99, 47, 94, 70]
Chunks: [[67, 24, 22, 53], [86, 19, 28, 24], [92, 42, 11, 83], [32, 99, 47, 94], [70]]

Method 3: Create Sublists Using List Comprehension and Slicing

In this method, we will use list comprehension and slice 4 items at a time.

import random

nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4

chunks = [nums[i:i+batch_size] for i in range(0, len(nums), batch_size)]

print('List:', nums)
print('Chunks:', chunks)

Output:

List: [24, 26, 17, 24, 35, 95, 37, 39, 58, 67, 52, 9, 28, 54, 9, 67, 10]
Chunks: [[24, 26, 17, 24], [35, 95, 37, 39], [58, 67, 52, 9], [28, 54, 9, 67], [10]]

Method 4: Create Sublists Using For Loop and list.append()

In this method, we will use use a regular for loop, create a slice with 4 items and append each slice to a new list.

import random

nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4

chunks = []

for i in range(0, len(nums), batch_size):
    chunks.append(nums[i:i+batch_size])

print('List:', nums)
print('Chunks:', chunks)

Output:

List: [85, 17, 51, 39, 68, 19, 11, 5, 88, 34, 10, 28, 43, 2, 93, 39, 14]
Chunks: [[85, 17, 51, 39], [68, 19, 11, 5], [88, 34, 10, 28], [43, 2, 93, 39], [14]]

Method 5: Create Sublists Using numpy

In this method, we will use the numpy module and call the numpy.array_split() method to give us an iterator for each chunk. We will then apply the list() function for each chunk and we end up with a list of chunks, each having 4 items.

import random
import numpy as np

nums = [random.randint(1, 100) for _ in range(17)]
batch_size = 4

chunks = [list(x) for x in np.array_split(nums, batch_size)]

print('List:', nums)
print('Chunks:', chunks)

Output:

List: [48, 19, 8, 5, 8, 89, 18, 58, 91, 47, 6, 56, 61, 79, 32, 52, 98]
Chunks: [[48, 19, 8, 5, 8], [89, 18, 58, 91], [47, 6, 56, 61], [79, 32, 52, 98]]

âČ Benchmarking All These 5 Methods

  1. For our benchmarking, we will use time.perf_counter().
  2. We will create list nums with random 10 million numbers from 1 through 1,000.
  3. We will use a larger batch size of 100 items per sublist.
  4. We will call time.perf_counter() at the start and end of the program and find the difference.

Method 1: Benchmarking

import random
from itertools import batched
from time import perf_counter

start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100

nums = [random.randint(1, i_max) for _ in range(n)]
chunks = list(batched(nums, batch_size))

end_time = perf_counter()
print('Method 1 # Time taken:', end_time - start_time, 'milliseconds')

Output:

Method 1 # Time taken: 5.685045983991586 milliseconds

Method 2: Benchmarking

import random
from itertools import batched
from time import perf_counter

start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100

nums = [random.randint(1, i_max) for _ in range(n)]
chunks = [list(chunk) for chunk in batched(nums, batch_size)]

end_time = perf_counter()
print('Method 2 # Time taken:', end_time - start_time, 'milliseconds')

Output:

Method 2 # Time taken: 6.131341928004986 milliseconds

Method 3: Benchmarking

import random
from time import perf_counter

start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100

nums = [random.randint(1, i_max) for _ in range(n)]
chunks = [nums[i:i+batch_size] for i in range(0, len(nums), batch_size)]

end_time = perf_counter()
print('Method 3 # Time taken:', end_time - start_time, 'milliseconds')

Output:

Method 3 # Time taken: 6.003923358017346 milliseconds

Method 4: Benchmarking

import random
from time import perf_counter

start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100

nums = [random.randint(1, i_max) for _ in range(n)]

chunks = []

for i in range(0, len(nums), batch_size):
    chunks.append(nums[i:i+batch_size])

end_time = perf_counter()
print('Method 4 # Time taken:', end_time - start_time, 'milliseconds')

Output:

Method 4 # Time taken: 5.763842954998836 milliseconds

Method 5: Benchmarking

import random
import numpy as np
from time import perf_counter

start_time = perf_counter()
n = 10_000_000
i_max = 1000
batch_size = 100

nums = [random.randint(1, i_max) for _ in range(n)]
chunks = [list(x) for x in np.array_split(nums, batch_size)]

end_time = perf_counter()
print('Method 5 # Time taken:', end_time - start_time, 'milliseconds')

Output:

Method 5 # Time taken: 6.788155929010827 milliseconds

Conclusion

This was a close competition and I ran each benchmarking several times. The results in this blog post are those of the final run. Your results may vary.

The fastest appears to be Method 1, which is itertools.batched() that returns a list of tuples.

The slowest appears to be Method 5, which is breaking the list into chunks using numpy.

If you have any input to add, please do so in the comments, or you can send me an email. Thanks for reading.

Related Posts

If you have any questions, please contact me at arulbOsutkNiqlzziyties@gNqmaizl.bkcom. You can also post questions in our Facebook group. Thank you.

Disclaimer: Our website is supported by our users. We sometimes earn affiliate links when you click through the affiliate links on our website.

Last Updated: August 27, 2024.     This post was originally written on August 27, 2024.