官术网_书友最值得收藏!

The implications of proprietary versus open source tools

Over the past 20 years, there has been a significant change in technology that enables teams and inpiduals to design, develop, and deploy analytical applications. The change has been in the evolution and refinement of open source software and related platforms.

20 years ago, there were only proprietary software offerings from SAS, SPSS, Statistica, Minitab, and others. An entire generation, or possibly two generations, of psychologists, social scientists, mathematicians, and analysts grew up using these software systems in undergraduate- and graduate-level academic programs. When entering the business, research, and governmental workforces, those people brought their favorite tools with them.

More recently, open source systems like Knime, RapidMiner, and others offer community versions for free. In addition to the many open source tools and community versions available, the rise and evolution of the R and Python languages have provided a rich toolset for people to build advanced analytics applications without purchasing expensive proprietary software.

Most people stop at the point in the discussion where the community version of the open source tools does not have a license fee. That is missing the point. As the coauthor of my first book, Analytics: How to Win with Intelligence, Shawn Rogers, is fond of saying, "Open source is free like a puppy." Yes, it is great to get the puppy and it is free at that moment, but there are many expenses that come along with the free bundle of joy. In almost all cases, if you are going to use open source software for production purposes, you will need to buy support, the enterprise version, management software, and interactive development environment or other software and/or services to make the environment effective, efficient, productive, secure, and collaborative.

The truly important element of this market characteristic for the purpose of our discussion is that the proprietary software, open source delineation in the market splits the age of the people you will be looking to hire by age.

From my informal research and observation over the past 5 to 7 years, most data scientists that are over 40 years old will predominately want to use proprietary software. The vast majority of data scientists under 30 years old will use open source coupled with R or Python.

This simple observation and the fact that data scientists are typically allowed to use the tools that they feel most comfortable with means that to have a collaborative team, you want the majority of the team to use tools that foster the sharing of code, approaches, and methodologies.

In practical and simplistic terms, this means that you will either end up with an older team using proprietary tools or a younger team using open source tools supplemented with R and or Python. Again, like the evolution of the team, you can let this organically develop, but acting in this manner will cause you and your team lost productivity, team conflict, and other management headaches.

Pick one approach or the other. Do not mix and match.

主站蜘蛛池模板: 广西| 分宜县| 许昌市| 高邮市| 潜江市| 长阳| 双鸭山市| 通州市| 武冈市| 宁陵县| 繁昌县| 崇仁县| 内乡县| 公主岭市| 夏津县| 宁德市| 米脂县| 吉隆县| 陆丰市| 全南县| 上虞市| 文化| 曲松县| 凤翔县| 咸丰县| 涿州市| 富顺县| 象山县| 合阳县| 漯河市| 商南县| 齐齐哈尔市| 延安市| 阿克| 五河县| 明溪县| 南溪县| 达尔| 南投县| 伊春市| 龙岩市|