- Learning Apache Thrift
- Krzysztof Rakowski
- 2059字
- 2021-07-23 14:55:39
An introduction to Apache Thrift
You probably know Facebook, the popular social network. A small website started in 2004 as a funny side project by a Harvard student, Mark Zuckerberg, gained huge popularity, having more and more users. In its early years it faced rapid growth in terms of traffic, system, and network structure. Their engineering culture allowed choosing any solution that was deemed optimal for a given task without any constraints or standards. This led to a situation when they had lots of different services, but no reliable way to connect them together. Describing Apache Thrift, Facebook's engineers stated in the white paper (you can download it from https://thrift.apache.org/static/files/thrift-20070401.pdf):
"(...) we were presented with the challenge of building a transparent, high-performance bridge across many programming languages."
They tested solutions available in the market and came to the conclusion that none of them fulfilled the requirements of high performance, flexibility, and simplicity. The result of their work was Thrift—a piece of software that was later open sourced and handed over to the Apache Foundation.
Apache Thrift's simplicity comes from the fact that the code for different programming languages is generated automatically from a single file written in the interface definition language (IDL). In other similar solutions, data has to be prepared before it is transferred to meet the limitations of the method of transport—not all structures are easily transferred. In most cases, simple data types such as strings are integers and transferrable. Due to this, a developer has to translate every structure more complex than that to the text form in a process called serialization. This has to be done on both ends (deserialization being the reverse process), which needs extra work, testing, and debugging. In the case of Apache Thrift, the developer can use data types native to their programming language of choice using the methods dedicated to this language. All serialization and deserialization is made by the Apache Thrift itself and is not visible to the developer. This architecture of the solution allows programmers to focus on working on the actual services, and not having to care about how the data is going to be transferred from one application to another.
Let's have a quick glance at the pillars of Apache Thrift. Some of the topics will be covered in much more detail in Chapter 4, Understanding How Apache Thrift Works, so here are just the basics that you will need to understand our first code examples.
Supported programming languages
Before starting any work with Apache Thrift, you should probably check whether it supports the programming language that you use. Of course, there is a great chance that it does—most of the popular languages are supported. The complete list for version 0.9.3 is as follows:
- ActionScript 3
- C++
- C#
- D
- Delphi
- Erlang
- Haskell
- Java
- JavaScript
- Node.js
- Objective-C/Cocoa
- OCaml
- Perl
- PHP
- Python
- Ruby
- Smalltalk
Note
Note that Apache Thrift is still in the pre-1.0 version, so some of the languages may be not fully supported. It is best to check on the Apache Thrift website (https://thrift.apache.org/docs/features), in the source code, or try to learn the current status of support for your favorite programming language yourself.
If your language of choice is on the list (especially if it is a popular one), you can be sure that you will be able to generate all the code necessary to work with Apache Thrift.
Data types
One of the basic features of every programming language is their data types. Although the basic ones may be very similar, that is, integer or string, it may not be that easy for the rest of them. Some of the languages (for example, C++) are statically typed. This means that the type of the variable has to be known at the compile time. Thus, it has to be defined in the source code when the program is written. After that, the variable can be of only this type. For example, consider the following line from C++:
int x = 42;
It initializes the variable x
, which is an integer. This variable has to stay an integer through the execution of the program. If later on you would like to assign a value of some other type, it will produce an error as soon as you compile your program. Let's take a look at the following example:
int main() { int x = 42; // this line will produce compilation error x = "forty two"; return 0; }
If you try to compile this simple code, you will end up with the following compile error:
$ g++ -o example example.cpp example.cpp: In function 'int main()': example.cpp:4:6: error: invalid conversion from 'const char*' to 'int' [-fpermissive] x = "forty two"; ^
Other languages are dynamically typed, that is, the type of the variable is checked in the runtime, but in the source code it might be anything, any time. Consider this example from PHP:
if (rand(0,1) == 1) { $x = 42; } else { $x = "forty two"; } var_dump($x); // var_dump() function prints type of specified // variable and its value
Depending on the random outcome of the condition, the value of the variable may be either integer or string. Let's take a look at the following example:
$ php -f example.php
The result of running this program would be either string(9) "forty-two"
or int(42)
.
As you can see, both values are permitted as PHP interpreter changes the type of the variable during the runtime.
Programming language allows that and, moreover, later on, you can assign values of different types to the same variable.
Without Apache Thrift, developer would have to serialize the variables. It means that before the variables are transferred, they should be mapped to the most basic data types that are understood by every programming language (most probably, integers and strings of characters). After the transmission, those serialized variables have to be translated back to the structures available in the programming language at the receiving end.
Apache Thrift does all that dirty work for the developer. It provides its own data types that are then mapped to the ones native to the given programming language, thereby allowing the developer to focus on creating the application, not the communication interface.
Transports
Transports are a part of Apache Thrift's network stack. They allow you to transmit data over different channels, that is, HTTP protocol, sockets, or files. Decoupling the transport layer lets you to easily choose the transport that best fits your solution without many changes in the code.
The choice of transport should be dictated by the architecture of your solution.
Protocols
Protocols prepare data to be transmitted over transports. The name of the process is called serialization (when sending data) and deserialization (when receiving data). There are different protocols that can be used: JSON, binary, plain text, and so on. It means that depending on what data you want to transfer, you can use different methods of serialization. For example, if you expect to transmit images or other binary data, choosing the binary protocol is the best option as there would be almost zero overhead. If you chose JSON for this purpose, binary data would be converted to text, thereby increasing the payload by a third or more.
The choice of protocol should be dictated by the data you wish to transfer using Apache Thrift.
Versioning
Versioning is an approach for managing changes in the service's API (and in the software in general). As software is being developed, it changes. Sometimes the changes are miniscule, and sometimes great. They are often manifested by modification of the methods or parameters exposed by the API.
When developing client and server software, you shouldn't assume that clients will be updated to the newest version instantly. It is not possible, even if you have total control of the environment. It is also wise to allow the older versions of the client to work with the newer versions of the server.
Changes in the APIs, libraries, and other externally available components pose a big challenge for the developers, leading to problems often referred to as dependency hell—when different applications are compatible with different versions of the same library or API, leading to difficulties with managing those dependencies. To alleviate this inconvenience, most of the software developers adopt a convention of marking the version of the application with decimal numbers, according to the template, MAJOR.MINOR.PATCH
, where PATCH
means miniscule changes (that is, fixing some bugs), MINOR
is a larger change but backward-compatible with the previous versions, and MAJOR
means a major release that might break the compatibility with the previous versions of the software.
Apache Thrift's feature is soft versioning. It means that there are no formal requirements as to how the changes between the subsequent versions should be handled or announced. However, it delivers a set of tools that allows users to easily keep backward compatibility with the new versions of the service. It is achieved by the following properties:
- The method's arguments are numbered. You can add or remove them. As long as the same number is not reused, the new versions of methods may function without removed arguments. Those numbers shouldn't be changed for any existing argument.
- You can set default values for the arguments, so if the older version of the client has a method without a new variable, the service doesn't receive any value for such an argument and the default value is set. This is useful when you want to add some fields.
- While manipulation with fields is relatively easy, you shouldn't rename methods or services. This makes them unavailable for the older clients.
Security
Security is essential to every service. Although you definitely need to take extra care when exposing services to the Internet, it is also important when they are available in private networks.
Apache Thrift allows you to use TSSLTransportFactory
to utilize RSA key pairs, providing security for the connection.
Another way of securing your Apache Thrift connection (although a little bit more complicated) is tunneling it over SSH.
We will discuss this in the detail in Chapter 8, Advanced Usage of Apache Thrift.
Interface description language
Apache Thrift's core feature is its own IDL, one that shapes its simplicity and usability. It will be familiar at first sight to anyone who has programmed in contemporary programming languages. Using IDL, you are able to define the service and all the variables that it uses in one file. It is an unambiguous description of what the service will look similar to without going into the implementation details.
Let's consider a very simple service, which allows you to add two integers:
namespace py thrift.example1 namespace php thrift.example1 service AddService { i32 add(1: i32 a, 2: i32 b), }
This example code defines AddService
service, which contains the add
method. This method takes two 32-bit signed integers (i32) as parameters and also returns such an integer as a result. We will want to have the code generated for Python and PHP languages, but of course Apache Thrift is able to do it for a far greater spectrum of languages.
Now the Thrift's magic begins; if you save this code to the file (let's say, example1.thrift
) and run the following commands, you will get the code of client and server for this service in desired languages (Python and PHP in this example) in the newly-created folders, gen-py
and gen-php
:
$ thrift --gen py example1.thrift $ thrift --gen php example1.thrift
In the simplest solution, it is enough to fill the code of the add
method, and voilà, you have a fully-functional client and server.
This example is, of course, oversimplified, but shows the major advantage of Apache Thrift—the ability to define in one place and then instantly generate services and the corresponding client code without the need of writing code in every language from scratch. It is a great tool not only for final solutions, but also for rapid prototyping for different programming languages.
To see how much work Apache Thrift just spared you, examine the generated files that are saved in the gen-py
and gen-php
folders.
IDL is a very powerful tool. It has a lot of options and gives you a great deal of flexibility. We will discuss it in greater detail in Chapter 4, Understanding How Apache Thrift Works.
- 大話PLC(輕松動漫版)
- 圖解Java數(shù)據(jù)結構與算法(微課視頻版)
- 深入理解Django:框架內(nèi)幕與實現(xiàn)原理
- Python爬蟲開發(fā):從入門到實戰(zhàn)(微課版)
- Mastering Ubuntu Server
- C++ 從入門到項目實踐(超值版)
- HTML5從入門到精通 (第2版)
- 青少年學Python(第1冊)
- Nginx實戰(zhàn):基于Lua語言的配置、開發(fā)與架構詳解
- Visual C#通用范例開發(fā)金典
- Android系統(tǒng)級深入開發(fā)
- 單片機C語言程序設計實訓100例
- Mastering Elixir
- 算法設計與分析:基于C++編程語言的描述
- 量子計算機編程:從入門到實踐