Many applications require more robust storage systems then text files, which is why many applications use databases to store data. There are many kinds of databases, but there are two broad categories: relational databases, which support a standard declarative language called SQL, and so called NoSQL databases, which are often able to work without a predefined schema and where a data instance is more properly described as a document, rather as a row.
MongoDB is a kind of NoSQL database that stores data as documents, which are grouped together in collections. Documents are expressed as JSON objects. It is fast and scalable in storing, and also flexible in querying, data. To use MongoDB in Python, we need to import the pymongo package and open a connection to the database by passing a hostname and port. We suppose that we have a MongoDB instance, running on the default host (localhost) and port (27017):
The above snippet says that our MongoDB instance only has one database, named 'local'. If the databases and collections we point to do not exist, MongoDB will create them as necessary:
>>> db = conn.db>>> dbDatabase(MongoClient('localhost', 27017), 'db')
Each database contains groups of documents, called collections. We can understand them as tables in a relational database. To list all existing collections in a database, we use collection_names() function:
The df_ex2 is transposed and converted to a JSON string before loading into a dictionary. The insert() function receives our created dictionary from df_ex2 and saves it to the collection.
If we want to list all data inside the collection, we can execute the following commands:
If we want to query data from the created collection with some conditions, we can use the find() function and pass in a dictionary describing the documents we want to retrieve. The returned result is a cursor type, which supports the iterator protocol:
>>> cur = collection.find({'3' : 'male'})>>> type(cur)pymongo.cursor.Cursor>>> result = pd.DataFrame(list(cur))>>> result 0 1 2 3 4 _id0 Vinh 39 3 male vl 557da218f21c761d7c176a401 Nghia 26 3 male dn 557da218f21c761d7c176a412 Hung 42 3 male tn 557da218f21c761d7c176a443 Nam 7 1 male hcm 557da218f21c761d7c176a45
Sometimes, we want to delete data in MongdoDB. All we need to do is to pass a query to the remove() method on the collection:
>>> # before removing data>>> pd.DataFrame(list(collection.find())) 0 1 2 3 4 _id0 Vinh 39 3 male vl 557da218f21c761d7c176a401 Nghia 26 3 male dn 557da218f21c761d7c176a412 Hong 28 4 female dn 557da218f21c761d7c176a423 Lan 25 3 female hn 557da218f21c761d7c176a434 Hung 42 3 male tn 557da218f21c761d7c176a445 Nam 7 1 male hcm 557da218f21c761d7c176a456 Mai 11 1 female hcm 557da218f21c761d7c176a46>>> # after removing records which have '2' column as 1 and '3' column as 'male'>>> collection.remove({'2': 1, '3': 'male'}){'n': 1, 'ok': 1}>>> cur_all = collection.find();>>> pd.DataFrame(list(cur_all)) 0 1 2 3 4 _id0 Vinh 39 3 male vl 557da218f21c761d7c176a401 Nghia 26 3 male dn 557da218f21c761d7c176a412 Hong 28 4 female dn 557da218f21c761d7c176a423 Lan 25 3 female hn 557da218f21c761d7c176a434 Hung 42 3 male tn 557da218f21c761d7c176a445 Mai 11 1 female hcm 557da218f21c761d7c176a46
We learned step by step how to insert, query and delete data in a collection. Now, we will show how to update existing data in a collection in MongoDB:
>>> doc = collection.find_one({'1' : 42})>>> doc['4'] = 'hcm'>>> collection.save(doc)ObjectId('557da218f21c761d7c176a44')>>> pd.DataFrame(list(collection.find())) 0 1 2 3 4 _id0 Vinh 39 3 male vl 557da218f21c761d7c176a401 Nghia 26 3 male dn 557da218f21c761d7c176a412 Hong 28 4 female dn 557da218f21c761d7c176a423 Lan 25 3 female hn 557da218f21c761d7c176a434 Hung 42 3 male hcm 557da218f21c761d7c176a445 Mai 11 1 female hcm 557da218f21c761d7c176a46
The following table shows methods that provide shortcuts to manipulate documents in MongoDB: