- Java Data Science Cookbook
- Rushdi Shams
- 247字
- 2021-07-09 18:44:26
Parsing Tab Separated Value (TSV) file using Univocity
Unlike CSV files, Tab Separated Value (TSV) files contain data that is separated by tab delimiters. This recipe shows you how to retrieve data points from TSV files.
Getting ready
In order to perform this recipe, we will require the following:
- Download the Univocity JAR file from http://oss.sonatype.org/content/repositories/releases/com/univocity/univocity-parsers/2.2.1/univocity-parsers-2.2.1.jar. Include the JAR file in your project in Eclipse an external library.
- Create a TSV file from the following data using Notepad. The extension of the file should be
.tsv
. You save the file asC:/testTSV.tsv
:
Year Make Model Description Price 1997 Ford E350 ac, abs, moon 3000.00 1999 Chevy Venture "Extended Edition" 4900.00 1996 Jeep Grand Cherokee MUST SELL!nair, moon roof, loaded 4799.00 1999 Chevy Venture "Extended Edition, Very Large" 5000.00 Venture "Extended Edition" 4900.00
How to do it...
- Create a method named
parseTsv(String)
that takes the name of the file as a String argument:public void parseTsv(String fileName){
- The line separator for the TSV file in this recipe is a newline character or
n
. To set this character as the line separator, modify the settings:settings.getFormat().setLineSeparator("n");
- Using these settings, create a TSV parser:
TsvParser parser = new TsvParser(settings);
- Parse all rows of the TSV file at once as follows:
List<String[]> allRows = parser.parseAll(new File(fileName));
- Iterate over the list object to print/process the rows as follows:
for (int i = 0; i < allRows.size(); i++){ System.out.println(Arrays.asList(allRows.get(i))); }
- Finally, close the method:
}
The full method with the driver method in a class will look like the following:
import java.io.File; import java.util.Arrays; import java.util.List; import com.univocity.parsers.tsv.TsvParser; import com.univocity.parsers.tsv.TsvParserSettings; public class TestTsv { public void parseTsv(String fileName){ TsvParserSettings settings = new TsvParserSettings(); settings.getFormat().setLineSeparator("n"); TsvParser parser = new TsvParser(settings); List<String[]> allRows = parser.parseAll(new File(fileName)); for (int i = 0; i < allRows.size(); i++){ System.out.println(Arrays.asList(allRows.get(i))); } } }
推薦閱讀
- 數據要素安全流通
- 在你身邊為你設計Ⅲ:騰訊服務設計思維與實戰
- Test-Driven Development with Mockito
- Python廣告數據挖掘與分析實戰
- 數據結構與算法(C語言版)
- Mastering Machine Learning with R(Second Edition)
- 大數據Hadoop 3.X分布式處理實戰
- Ceph源碼分析
- 數據挖掘原理與SPSS Clementine應用寶典
- 基于OPAC日志的高校圖書館用戶信息需求與檢索行為研究
- SQL優化最佳實踐:構建高效率Oracle數據庫的方法與技巧
- INSTANT Android Fragmentation Management How-to
- 大數據數學基礎(R語言描述)
- 算法設計與分析
- Hands-On Deep Learning for Games