- Python High Performance(Second Edition)
- Gabriele Lanaro
- 491字
- 2021-07-09 21:01:56
Sets
Sets are unordered collections of elements, with the additional restriction that the elements must be unique. The main use-cases where sets are a good choice are membership tests (testing if an element is present in the collection) and, unsurprisingly, set operations such as union, difference, and intersection.
In Python, sets are implemented using a hash-based algorithm just like dictionaries; therefore, the time complexities for addition, deletion, and test for membership scale as O(1) with the size of the collection.
Sets contain only unique elements. An immediate use case of sets is the removal of duplicates from a collection, which can be accomplished by simply passing the collection through the set constructor, as follows:
# create a list that contains duplicates
x = list(range(1000)) + list(range(500))
# the set *x_unique* will contain only
# the unique elements in x
x_unique = set(x)
The time complexity for removing duplicates is O(N), as it requires to read the input and put each element in the set.
Sets expose a number of operations like union, intersection, and difference. The union of two sets is a new set containing all the elements of both the sets; the intersection is a new set that contains only the elements in common between the two sets, and the difference is a new set containing the element of the first set that are not contained in the second set. The time complexities for these operations are shown in the following table. Note that since we have two different input sizes, we will use the letter S to indicate the size of the first set (called s), and T to indicate the size of the second set (called t):

An application of set operations are, for example, Boolean queries. Going back to the inverted index example of the previous subsection, we may want to support queries that include multiple terms. For example, we may want to search for all the documents that contain the words cat and table. This kind of a query can be efficiently computed by taking the intersection between the set of documents containing cat and the set of documents containing table.
In order to efficiently support those operations, we can change our indexing code so that each term is associated to a set of documents (rather than a list). After applying this change, calculating more advanced queries is a matter of applying the right set operation. In the following code, we show the inverted index based on sets and the query using set operations:
# Building an index using sets
index = {}
for i, doc in enumerate(docs):
# We iterate over each term in the document
for word in doc.split():
# We build a set containing the indices
# where the term appears
if word not in index:
index[word] = {i}
else:
index[word].add(i)
# Querying the documents containing both "cat" and "table"
index['cat'].intersection(index['table'])
- 深入理解Bootstrap
- Java系統(tǒng)分析與架構(gòu)設(shè)計
- Mastering ServiceStack
- Learn Type:Driven Development
- 劍指Offer(專項突破版):數(shù)據(jù)結(jié)構(gòu)與算法名企面試題精講
- Web Scraping with Python
- NLTK基礎(chǔ)教程:用NLTK和Python庫構(gòu)建機(jī)器學(xué)習(xí)應(yīng)用
- Oracle 12c中文版數(shù)據(jù)庫管理、應(yīng)用與開發(fā)實(shí)踐教程 (清華電腦學(xué)堂)
- Practical Game Design
- Java EE 7 Development with NetBeans 8
- Angular開發(fā)入門與實(shí)戰(zhàn)
- Access 2010數(shù)據(jù)庫應(yīng)用技術(shù)(第2版)
- Linux C編程:一站式學(xué)習(xí)
- PHP從入門到精通(第4版)(軟件開發(fā)視頻大講堂)
- Learning jQuery(Fourth Edition)