Python & R vs. SPSS & SAS with remarks on Power BI & Tableau

June 2021, Mohsin Raza & Paul van Puijenbroek, DPulse

We compare the four most commonly used statistical analysis programs, two open source languages (Python and R) with two commercial parties (SPSS and SAS). The world around data is moving very fast. That's why it's high time to take a look at how things are with Python, R, SPSS and SAS. We also look at Power BI and Tableau, applications for dashboarding and reporting, and remark where they stand in this discussion.

Short description of each of the languages

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data scientists for developing statistical software and data analysis. R is open source, with a huge community. This community makes R so strong in both the explanatory and the predictive side of data analytics.

Python is also an open source language with an ever-increasing open source community. Python was conceived in the late 1980s at Centrum Wiskunde & Informatica (CWI) in the Netherlands. Its implementation began in December 1989, and its first release came in 1991. Python has developed strongly in the field of machine learning and can be used especially well in advanced analytics.

SAS was developed at North Carolina State University from 1966 until 1976, when SAS Institute was incorporated. SAS delivers analytics solutions that include data mining, statistical analysis, forecasting and econometrics, optimization and simulation, and text analytics.

SPSS Statistics is a software package used for interactive, or batched, statistical analysis. Long produced by SPSS Inc., it was acquired by IBM in 2009. SPSS offers advanced statistical analysis, a vast library of machine learning algorithms, text analysis, open source extensibility, integration with big data and seamless deployment into applications.

Comparison of the packages for the past few years

In Table 1, we present a summary of the comparison of the 4 technologies. The comparison is done on the following dimensions: Methods and Techniques, Ease of Learning, Graphical Capabilities, Customer Service Support & Community, Deep Learning Support, and Cost.

Criterion SAS & SPSS R Python
Methods & Techniques Medium High High
Ease of Learning High Medium Medium
Graphical Capabilities Low High Medium
Customer Service Support & Community Medium High Medium
Deep Learning Support Low Medium High
Cost High Low Low

Table 1: Comparison of R & Python vs. SAS & SPSS

We use three categories to describe their qualities, “High”, “Medium”, and “Low” to compare their relative strengths. In Appendix 1, an explanation of the scores is given.

Some big companies using the packages

R: Facebook, Google, Twitter, Microsoft, Uber, Airbnb, IBM, ANZ, Stack Overflow, AstraZeneca

Python: Google, Facebook, Instagram, Spotify, Quora, Netflix, Dropbox, Reddit

SAS: Honda, Apple, Bank of America, DSM, HP, Lenovo, Rabobank, VISA, AEGON

SPSS: eBay, KPMG, Cognizant, IBM, Accenture, Baylor Scott & White Health, Weill Cornell Medicine, BMO Financial Group, The American Red Cross

It is clear that the more “traditional” companies are still working with SPSS or SAS and the “younger” companies are more working with R & Python.

Some remarks about Power BI and Tableau

As many companies use applications like Power BI and Tableau to generate their dashboards & reporting and also might use them for predictive analytics, we briefly outline where they stand in the data science ecosystem. PBI and Tableau focus in the area of Business Intelligence. Additionally, Power BI and Tableau provide limited capabilities in advanced analytics. To complement, the limitations of their capabilities, these packages support limited integration with the R and Python ecosystems. The limitations are further explained in Appendix 2.

It is clear that these companies realize that there is nothing more powerful than R and Python when it comes to predictive analytics and statistical analysis, including the visualizations of the results, currently and probably for many years in the future. Thanks to their open source coding and their huge communities who continuously attribute to the improvement of R and Python.

Stack Overflow is predominantly the most widely used platform where developers ask questions about tooling. Looking at how often the Python, R and SAS tags appear (SPSS is not in them), a clear trend can be seen in the below Figure 1:

Figure1: Stack Overflow Trends, 19/06/2021

The graph in Figure 1 shows the number of questions asked about Python, R and SAS in Stackoverflow compared to all of the questions asked about any other technologies since 2008.

Given the popularity of Stackoverflow among the developers and data scientists, it is a useful indicator about the popularity of a particular technology. What we see in figure 1 is that 16% of the questions are related to Python and almost 4% to R and still growing. Questions about SAS are hardly asked, close to 0%. Clearly showing the popularity of R and Python over SAS.

The open source community is also growing with large organizations that develop programs and packages and publish them as open source, such as Netflix, Airbnb or Google. Lately, there is an important trend emerging, i.e., the call for explainability within predictive machine learning models. This is also another argument for using R and Python. On the one hand, this still makes R very strong. On the other hand, you see that the interest within Python in this is also growing. Both environments contribute to transparency regarding what is happening within the algorithms. 'Explainable Artificial Intelligence', or also XAI, is an emerging term in the field for a reason.

Conclusion

In recent years there is a clear decrease in the use of SAS and SPSS and an increase in use of R and especially Python. The communities behind this open source tooling make these languages so strong, which only makes for a bigger community. The growth in possibilities is therefore increasing exponentially, which, combined with the developments in deep learning, gives R and Python a competitive edge that SAS and SPSS cannot compete with. Looking forward this competitive edge will only get bigger as the new techniques like deep learning will become more important.

References

  1. https://insights.stackoverflow.com/trends?tags=r%2Cpython%2Csas

  2. https://techvidvan.com/tutorials/sas-vs-r-vs-python/

  3. https://www.educba.com/sas-vs-r-vs-python/

  4. https://www.analyticsvidhya.com/blog/2017/09/sas-vs-vs-python-tool-learn/

  5. https://cmotions.nl/python-r-vs-spss-sas/

  6. https://www.tableau.com/learn/whitepapers/using-r-and-tableau

  7. https://docs.microsoft.com/en-us/power-bi/connect-data/service-r-packages-support#r-scripts-that-are-not-supported-in-power-bi

  8. https://docs.microsoft.com/en-us/power-bi/create-reports/desktop-r-visuals#known-limitations


Appendix 1: Explanation scores

Methods & Techniques

All 4 ecosystems have all the basic and most needed functions available. Due to their open nature, R & Python get latest features quickly. SAS and SPSS, on the other hand update their capabilities in new version roll-outs. Since R has been used widely in academics in past, development of new techniques is fast. Python community, also responds quickly to the new developments. This matters particularly for the more advanced latest developments in the form of packages and algorithms.

R and Python have a clear dominance in this due to the large community that helps develop packages and algorithms within R and Python. SAS and SPSS are lagging behind.

Ease of Learning

SAS and SPSS provide a good stable GUI which offer a point and click interface, which means the user does not necessarily have to code. In terms of resources, there are tutorials available on websites of various universities and they provide a comprehensive documentation.

R has the steepest learning curve among the 4 technologies listed here. It requires you to learn and understand coding. Python is known for its simplicity in programming world. This remains true for data analysis as well. While there are no widespread GUI interfaces as of now, Python notebooks will become more and more mainstream.

Graphical Capabilities

SAS and SPSS have decent functional graphical capabilities. However, they are just functional. Any customization on plots are difficult and requires you to understand intricacies of the packages.

R has highly advanced graphical capabilities along with Python. There are numerous packages which provide you advanced graphical capabilities.

This one also goes to R and Python. The graphs don't just look a lot slicker when using, for example, ggplot or D3. But R and Python also give you the opportunity to develop interactive tools.

Customer Service Support & Community

SAS and SPSS have dedicated customer support that helps with all issues about installation and usage. However, due to its cost, the community is not that large.

R does not have a dedicated customer service team, but it does have a massive community. The R community has people from almost all industries and from all over the world. A solution for any issue can be provided by the large community. Python is also open-source, and therefore, it also has a large community.

In practice, it is observed that the communities of R and Python are better and much faster. The elaboration of these communities not only ensures more development speed, but also better support via dedicated platforms.

Deep Learning Support

Deep learning is a collection of techniques that have proven themselves over the past few years as algorithms with the most potential. Deep Learning support in SAS and SPSS is still in its beginning phase and there’s a lot to work on it. On the other hand, Python has had great advancements in the field and has numerous packages like Tensorflow and Keras. R has recently added support for those packages.

Cost

SAS and SPSS are commercial software. They are expensive and still beyond reach for most of the professionals and organizations. R & Python, on the other hand are completely free.

Also this one is for R and Python. Both are open source, making them free to use. Nevertheless, they do have a longer learning curve than the GUIs of SPSS and SAS, which often makes training more expensive.

Appendix 2: Limitations of Power BI’s and Tableau’s Integration of R

Power BI:

Tableau: