Being able to read data is the most important skill for a data scientist, and this data is usually in text format, be it TXT, CSV, or any other format. In Java I/O API, the subclasses of the Reader classes deal with reading text files.
Suppose we have a text.txt file with some sentences (which may or may not make sense):
My dog also likes eating sausage
The motor accepts beside a surplus
Every capable slash succeeds with a worldwide blame
The continued task coughs around the guilty kiss
If you need to read the whole file as a list of strings, the usual Java I/O way of doing this is using BufferedReader:
List<String> lines = new ArrayList<>();
try (InputStream is = new FileInputStream("data/text.txt")) { try (InputStreamReader isReader = new InputStreamReader(is, StandardCharsets.UTF_8)) { try (BufferedReader reader = new BufferedReader(isReader)) { while (true) { String line = reader.readLine(); if (line == null) { break; } lines.add(line); }
isReader.close(); } } }
It is important to provide character encoding--this way, the Reader knows how to translate the sequence of bytes into a proper String object. Apart from UTF-8, there are UTF-16, ISO-8859 (which is ASCII-based text encoding for English), and many others.
There is a shortcut to get BufferedReader for a file directly:
Even with this shortcut, you can see that this is quite verbose for such a simple task as reading a list of lines from a file. You can wrap this in a helper function, or instead use the Java NIO API, which gives some helper methods to make this task easier: