官术网_书友最值得收藏!

Reading input data

Being able to read data is the most important skill for a data scientist, and this data is usually in text format, be it TXT, CSV, or any other format. In Java I/O API, the subclasses of the Reader classes deal with reading text files.

Suppose we have a text.txt file with some sentences (which may or may not make sense):

  • My dog also likes eating sausage
  • The motor accepts beside a surplus
  • Every capable slash succeeds with a worldwide blame
  • The continued task coughs around the guilty kiss

If you need to read the whole file as a list of strings, the usual Java I/O way of doing this is using BufferedReader:

List<String> lines = new ArrayList<>(); 

try (InputStream is = new FileInputStream("data/text.txt")) {
try (InputStreamReader isReader = new InputStreamReader(is,
StandardCharsets.UTF_8)) {
try (BufferedReader reader = new BufferedReader(isReader)) {
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
lines.add(line);
}

isReader.close();
}
}
}

It is important to provide character encoding--this way, the Reader knows how to translate the sequence of bytes into a proper String object. Apart from UTF-8, there are UTF-16, ISO-8859 (which is ASCII-based text encoding for English), and many others.

There is a shortcut to get BufferedReader for a file directly:

Path path = Paths.get("data/text.txt"); 
try (BufferedReader reader = Files.newBufferedReader(path,
StandardCharsets.UTF_8)) {
// read line-by-line
}

Even with this shortcut, you can see that this is quite verbose for such a simple task as reading a list of lines from a file. You can wrap this in a helper function, or instead use the Java NIO API, which gives some helper methods to make this task easier:

Path path = Paths.get("data/text.txt"); 
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);
System.out.println(lines);

The Java NIO shortcuts work only for files. Later, we will talk about shortcuts that work for any InputStream objects, not just files.

主站蜘蛛池模板: 沾益县| 新源县| 吉木乃县| 景泰县| 高清| 望谟县| 宣城市| 聂拉木县| 铁力市| 邵阳市| 平湖市| 寿阳县| 崇仁县| 策勒县| 东至县| 高青县| 曲水县| 沧源| 鞍山市| 杨浦区| 云安县| 剑川县| 澎湖县| 高平市| 齐河县| 祁阳县| 远安县| 镇平县| 中宁县| 墨竹工卡县| 长子县| 乌苏市| 梁山县| 肥城市| 连南| 中牟县| 迭部县| 珠海市| 余姚市| 宁海县| 贡嘎县|