- Python High Performance(Second Edition)
- Gabriele Lanaro
- 491字
- 2021-07-09 21:01:56
Sets
Sets are unordered collections of elements, with the additional restriction that the elements must be unique. The main use-cases where sets are a good choice are membership tests (testing if an element is present in the collection) and, unsurprisingly, set operations such as union, difference, and intersection.
In Python, sets are implemented using a hash-based algorithm just like dictionaries; therefore, the time complexities for addition, deletion, and test for membership scale as O(1) with the size of the collection.
Sets contain only unique elements. An immediate use case of sets is the removal of duplicates from a collection, which can be accomplished by simply passing the collection through the set constructor, as follows:
# create a list that contains duplicates
x = list(range(1000)) + list(range(500))
# the set *x_unique* will contain only
# the unique elements in x
x_unique = set(x)
The time complexity for removing duplicates is O(N), as it requires to read the input and put each element in the set.
Sets expose a number of operations like union, intersection, and difference. The union of two sets is a new set containing all the elements of both the sets; the intersection is a new set that contains only the elements in common between the two sets, and the difference is a new set containing the element of the first set that are not contained in the second set. The time complexities for these operations are shown in the following table. Note that since we have two different input sizes, we will use the letter S to indicate the size of the first set (called s), and T to indicate the size of the second set (called t):

An application of set operations are, for example, Boolean queries. Going back to the inverted index example of the previous subsection, we may want to support queries that include multiple terms. For example, we may want to search for all the documents that contain the words cat and table. This kind of a query can be efficiently computed by taking the intersection between the set of documents containing cat and the set of documents containing table.
In order to efficiently support those operations, we can change our indexing code so that each term is associated to a set of documents (rather than a list). After applying this change, calculating more advanced queries is a matter of applying the right set operation. In the following code, we show the inverted index based on sets and the query using set operations:
# Building an index using sets
index = {}
for i, doc in enumerate(docs):
# We iterate over each term in the document
for word in doc.split():
# We build a set containing the indices
# where the term appears
if word not in index:
index[word] = {i}
else:
index[word].add(i)
# Querying the documents containing both "cat" and "table"
index['cat'].intersection(index['table'])
- Mastering Entity Framework Core 2.0
- Delphi程序設計基礎:教程、實驗、習題
- 網店設計看這本就夠了
- C#實踐教程(第2版)
- Selenium Testing Tools Cookbook(Second Edition)
- Bootstrap 4 Cookbook
- IBM Cognos Business Intelligence 10.1 Dashboarding cookbook
- Mastering AWS Security
- Android應用開發深入學習實錄
- Java Web開發基礎與案例教程
- Pandas入門與實戰應用:基于Python的數據分析與處理
- Unity3D高級編程:主程手記
- LabVIEW案例實戰
- 大學計算機基礎
- AVR單片機C語言應用100例