官术网_书友最值得收藏!

gsutil for Google Cloud Storage

gsutil provides options to manage files, folders, and buckets in Google Cloud Storage. The first step in moving your data to Google Cloud and Google BigQuery is to export the data and upload to Google Cloud Storage:

  • Manually via the browser if it is small
  • Automate it for basic scenarios using gsutil, which comes with Google Cloud SDK
  • The third option will be to use the Google Cloud Storage API to perform advanced automation

Before using the gsutil command, make sure that the project and credentials configured in the Google Cloud SDK are pointing to the project and account which you intend to use by typing the following command:

gcloud info

We will now look at the features available with gsutil:

  • To see the list of options provided by gsutil, type the following command:
gsutil help
  • The available commands section shows the command-line switches available in the gsutil command to perform various operations, as shown in the following screenshot:
  • The Additional help topics section provides a brief overview of the some concepts, guidelines, and techniques used to work with Google Cloud Storage, as shown in the following screenshot:
  • To learn about the command-line switches or the help topics, type the following command. For example, the following command will show the list of options available in the command-line switch cp, which is used to copy files from local storage to Google Cloud and vice versa:
gsutil help cp 
  • The following command will display information about how to implement secure practices and the security features of Google Cloud Storage:

gsutil help security
  • Use the following command to get the version of gsutil installed on your system. To update the gsutil to the latest version use the update option as shown in the second line:
gsutil version
gsutil update
  • The following are some of the common options in gsutil used in the day-to-day uploading, downloading, and management of files to and from Google Cloud Storage. The following command lists all the buckets in the project saved in the default configuration. To list buckets from another project use the -p projectID switch: 
gsutil ls
  • To list objects within a bucket, use the ls option as shown in the first line of the following code. To list only files of a specified type, add a wildcard char to the filter, as shown in the second line. To list all the buckets with extensive details such as the region of the bucket, type of bucket, or access to the bucket, use the -L switch as show in the third line. To see the complete list of options available for the ls option, use the command in the fourth line:
gsutil ls gs://mybucketname
gsutil ls gs://mybucketname/*.csv
gsutil ls -L gs://mybucketname
gsutil help ls
  • To create a bucket, use the mb option and specify the class of the bucket, the location in which the bucket is to be created, the project in which the bucket is supposed to be created, and the bucket name:
gsutil mb -c classname -p projectID -l region-name gs://bucketname 
  • The following are the values for classname: multi_regionalregional, nearline, and coldline. The values for the regional and multi-regional locations can be found here: https://cloud.google.com/storage/docs/bucket-locations.
  • To manage the buckets and objects in Google Cloud Storage better, use the lifecycle option to set the life cycle of a bucket. The following command sets the life cycle of the objects in the bucket to automatically delete after 30 days:
gsutil lifecycle set 30-day-removal.json gs://bucketname
  • The content of 30-day-removal.json will be the following. The age specifies the number of days the object was in storage:
{  "rule":
  [
    {
      "action": {"type": "Delete"},
      "condition": {"age": 30}
    }
  ]
}
  • To change the storage class of the bucket using the lifecycle option, create a JSON file with the following content and use this file with the lifecycle option:
{ "rule":  
[
{
"action": {"type": "SetStorageClass","storageClass": "NEARLINE" },
"condition": {"age": 90}
}
]
}
  • To remove all lifecycle options from a bucket, save an empty JSON file with just {} in it and use the lifecycle set option. To see the list of lifecycle options set for a bucket, use the following command:
gsutil lifecycle get gs://bucketname

Before creating a bucket on Google Cloud, define the life cycle of the bucket and the files inside the bucket, and create a maintenance script to change the storage class of the bucket to nearline or coldline after 1 year of creation, a script to check the size of the buckets and send out alerts if the size is exceeded, and a script to delete unwanted files. For information on life cycle management, refer to this document: https://cloud.google.com/storage/docs/managing-lifecycles.

  • To create a folder inside the bucket or to upload a file from your local system to a GCS bucket, use the cp command shown as follows. The following command will copy the localdir1 local directory from the user's computer to the GCS bucket. To copy just a file, remove the -r switch and specify the filename and the target bucket or folder inside the bucket to be moved, as shown in the second line:
gsutil cp -r localdir1 gs://bucketname
gsutil cp employeedetails.csv gs://bucketname
  • To download files and folders from the GCS bucket to the local directory, use the following command:
gsutil cp -r gs://bucketname localdir2
  • If the source folder has a huge number of files then use the -m option in the cp command to do parallel upload or download of files. 
  • To merge multiple files into one file on GCS use the compose command. This will be helpful for merging files uploaded to GCS daily to one file at the end of the month, and it becomes easy to manage the files on GCS. The following commands merge the contents from file1.csv and file2.csv to fullrecords.csv. The fullrecords.csv file will be created if it does not exist:
gsutil compose gs://bucketname/file1.csv gs://bucketname/file2.csv gs://bucketname/fullrecords.csv

There are few limitations to the compose option. These limitations are documented here: https://cloud.google.com/storage/docs/gsutil/commands/compose.

  • To check the size of a bucket, folder or object in GCS use the du command. The first line in the code will display the size of all objects from all buckets in the default project. The second line will display the objects which match the wildcard character search and its size in bytes. Use the -h switch to show the size in human readable format. Use the -c switch to see the total size at the end of the list. To exclude files based on the wildcard, then use the -e option as shown in third line, which will exclude any .txt files in the bucket when listing and showing the total size:
gsutil du -h -c
gsutil du gs://bucketname/files-2017*.csv
gsutil du -e gs://bucketname/*.txt

Use this option to monitor the size and growth of your GCS buckets on a daily, weekly, or monthly basis; this will help you to track your billing increases.

  • To turn versioning of objects inside a bucket, use the versioning command in gsutil. The first command shows if versioning is set for the given bucket and the second command sets the versioning to on for the bucket. The third command turns off the versioning of objects in the specified bucket:
gsutil versioning get gs://bucketname
gsutil versioning set on gs://bucketname
gsutil versioning set off gs://bucketname
  • To sync folders from your local network to GCS buckets, use the rsync option. This is an option that I have used in almost all projects. The first command shown as follows will copy all the files in localDir1 to the target GCS bucket. To copy the subfolders and the files in the subfolders, use the -r option as shown in the second line. The rsync option can be used to upload files to GCS buckets and also to download files from GCS buckets to local folders. The first location is the source and the second location in the command is the destination location:
gsutil rsync localDir1 gs://bucketname
gsutil rsync -r localDir1 gs://bucketname

The preceding command will not remove a file from the destination location if it is removed from the source location. Use the -d option to remove any files that are not present in the source folder but present in the destination folder. This option should be used with caution as any files deleted cannot be recovered.

There are many other commands available in gsutil. A complete list of the commands and their options can be found here: https://cloud.google.com/storage/docs/gsutil/commands/acl. In the left navigation, under gsutil commands, you can see all available commands in gsutil.

主站蜘蛛池模板: 津市市| 铜鼓县| 定兴县| 肇州县| 中西区| 苍山县| 渭南市| 海宁市| 买车| 白玉县| 房产| 颍上县| 若尔盖县| 阿拉善左旗| 龙陵县| 临江市| 嫩江县| 芦溪县| 南江县| 宁强县| 右玉县| 阿拉善盟| 长垣县| 汕尾市| 南宁市| 巴彦淖尔市| 临颍县| 临澧县| 东宁县| 石河子市| 承德县| 博客| 瑞丽市| 芒康县| 延津县| 肃南| 广元市| 古田县| 绥滨县| 宜宾县| 射洪县|