官术网_书友最值得收藏!

Importing data as text

Tabular modeling natively supports the import of files with extensions of text (.txt), comma separated values (.csv), and tab separated values (.tab). Once a file to import has been defined, the user can specify the delimiter (column separator), and the importing interface scans the file to estimate the underlying data types for each column. Finally, the interface finishes the import process by loading the data according to this specification.

Getting ready

A text file to simulate the output file for the sales record header has been created and is available from the online resources for this chapter. This file includes the sales_order_id, customer_id, employee_id, currency_id, customer_po_id, sales_territory_id, order_dt (order date), due_dt (due date), ship_dt (ship date), and sales_amount fields.

The file will be imported as a table into PowerPivot (and the tabular model). A screenshot of the file is shown as follows:

Getting ready

How to do It…

Most data imports can be managed from the PowerPivot window:

  1. Launch the PowerPivot window.
  2. In the Home tab, click on the From Text button from the Get External Data grouping.
  3. Call the Sales Header TXT connection.
  4. Ensure that the file type Comma Separated Files (.csv) is selected from the drop-down box.
  5. Navigate to the resources listed in Chapter 2, Importing Data and select the 02_master.csv file.
  6. Click on Open.
  7. A new window will open showing the file structure. Ensure that Comma (,) is selected from the Column Separator drop-down list, and that the Use first row as column headers checkbox is ticked.
  8. Confirm the import by clicking on Finish, as shown in the following screenshot:
    How to do It…
  9. The data will load (click on Close to exit the Table Import Wizard window).

    Tip

    The Table Import Wizard displays the data that will be imported in columns and rows. Generally, it may be said that this grid is shared across all types of imports. You can choose not to import a particular column by deselecting the column from the header field selection (note that in the preceding screenshot, all columns are selected). Additionally, the amount of data to be imported can be filtered by selecting the drop-down arrow next to a field and applying a filter through the graphical interface. It is considered best practice to import only those columns which are needed in the model, because the extra columns or rows will increase the size of the model.

How it works…

Once the file name has been provided, the Table Import Wizard window uses a text driver to scan the first 200 rows of the file and determine the data type of each column. The text driver then uses this definition to import all the data for the file. The columns order_dt, due_dt, and ship_dt have been defined as dates, even though the file did not explicitly specify a date type (after all it is text).

Once the wizard has determined the data types for columns, it will use these types for the entire load. This may create issues where the value within the file does not conform to the data type specifications. In such a situation, the specific value will be discarded from the load. The row will be imported even if all the values are discarded. This type of situation can occur when a numeric value is expected, and a text value is found in the file, or the expected date format for the import cannot be derived from the underlying date value. For example, when the column format is DD-MM-YY, and the value follows the format MM-DD-YY.

The wizard can be defined for different delimiters. Tab, Comma, Semicolon, Space, Colon, and Vertical Bar (Pipe) are supported. This is regardless of the file extension (so you could import a .csv file with a Tab delimiter). However, the importer only allows you to import files that have extensions of .csv, .tab, or .txt. Additionally, you can specify if the first row of the file contains headings (and these are also imported).

There's more...

Although you can specify standard delimiters, the file may contain an unsupported character as its delimiter, and the data format chosen by the wizard may simply be the wrong type for the file. To overcome these issues (or the issues they may create), the text driver defaults (which are defined by the Wizard) can be overridden, so that the input file can be fully defined as load time. This is done by using a defining file titled schema.ini. This file needs to be saved in the same location as the file being imported. This offers the modeler full control over the specification of the import file. Consider the previous text import file with the following changes;

  • The delimiter is now a tilde (~)
  • The date format is MM-DD-YYYY (note that the local for the machine used in this book specified DD-MM-YYYY)

Repeat the recipe using the 02_master_tilde.csv file ensuring that the schema.ini file exists in the same directory. When the Wizard loads, an information box at the bottom of the window will indicate the presence of a schema.ini file, and this file will be used to specify import settings, as shown in the following screenshot:

There's more...

The schema.ini file is a text file and can therefore be examined in notepad—a sample of our file is shown in the next screenshot. In order for it to be used, it must reside in the same folder as the file(s) being imported. The following are some universal notes about the structure of the file that should be mentioned:

  • The name of the file for which the schema is to be applied is specified as the first line. Note that the import is specified for a 02_master_tilde.csv file
  • The date format is specified (see the row DateTimeFormat) and is applied to each date field
  • The delimiter is specified as a tilde
  • Each column (by number) can specify a name different from the file's header row
    There's more...

Naturally, there can only be one schema.ini file in a directory. However, the same file can be used to specify the format of individual files. This is achieved by simply extending the next specification with the name of the file, as has been done for 02_master_tilde.csv.

Note

Further information about the format of the schema.ini file can be found at http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx.

主站蜘蛛池模板: 鹤峰县| 类乌齐县| 和田县| 扎鲁特旗| 嫩江县| 教育| 桃园县| 城口县| 建德市| 乳源| 永兴县| 偏关县| 宾阳县| 石家庄市| 天水市| 海口市| 宕昌县| 寿光市| 格尔木市| 华安县| 剑阁县| 陕西省| 台南县| 连云港市| 靖西县| 高尔夫| 柏乡县| 苏尼特右旗| 玉溪市| 台中市| 阿克苏市| 河南省| 禹城市| 扶绥县| 周宁县| 缙云县| 定兴县| 阳曲县| 宣威市| 昭觉县| 江门市|