Newly released - Ten most commonly used algorithms for data scientists

Introduction: This article is based on the top ten algorithmic surveys conducted by KDnuggets, which ranks the algorithms commonly used by data engineers and describes their changes in 2011-2016.

Based on the survey, KDnuggets summarized the ten most commonly used data scientists, which are:

1. Regression regression algorithm

2. Clustering clustering algorithm

3. Decision Trees/Rules decision tree

4. Visualization visualization

5. k-Nearest Neighbor Proximity Algorithm

6. PCA (Principal Component Analysis) Principal Component Analysis Algorithm

7. Statistics Statistics Algorithm

8. Random Forests random forest algorithm

9. Time series/Sequence time series

10. Text Mining text mining

Among them, respondents said that an average of 8.1 algorithms were used, a substantial increase compared to similar surveys in 2011.

Compared with similar surveys in 2011, we found that the most popular algorithms are regression algorithms, clustering algorithms, decision trees, and visualization . The most significant increase is the following algorithm determined by (pct2016/pct2011 - 1):

Boosting , from 23.5% in 2011 to 32.8% in 2016, up 40% year-on-year

Text mining , from 27.7% in 2011 to 35.9% in 2016, a year-on-year increase of 30%

Visualization , from 38.3% in 2011 to 48.7% in 2016, a year-on-year increase of 27%

Time series , from 29.6% in 2011 to 37.0% in 2016, up 25% year-on-year

Abnormal/deviation detection , from 16.4% in 2011 to 19.5% in 2016, a year-on-year increase of 19%

Integrated approach , from 28.3% in 2011 to 33.6% in 2016, a year-on-year increase of 19%

Support vector machines , from 28.6% in 2011 to 33.6% in 2016, a year-on-year increase of 18%

Regression algorithm , from 57.9% in 2011 to 67.1% in 2016, a year-on-year increase of 16%

In addition, the most popular new algorithms in 2016 are:

K-nearest neighbor , 46%

Principal component analysis , 43%

Random Forest Algorithm , 38%

Optimization , 24%

Neural Network - Deep Learning , 19%

Singular value decomposition , 16%

The most significant declines are:

Association rules , from 28.6% in 2011 to 15.3% in 2016, a decrease of 47% year-on-year

Incremental model , from 4.8% in 2011 to 3.1% in 2016, a year-on-year decrease of 36%

Factor analysis , from 18.6% in 2011 to 14.2% in 2016, a year-on-year decrease of 24%

Survival analysis , from 9.3% in 2011 to 7.9% in 2016, a year-on-year decrease of 15%

The proportion of algorithms used in different fields

We noticed that almost everyone is using supervised learning algorithms .

Government and industry data scientists use more different algorithms than students or academia, and industrial data scientists prefer meta-algorithms.

Below, we continue to analyze the most popular 10 algorithms and deep learning through the types of employees.

To make these differences easier to see, we designed an algorithm for the average algorithm usage associated with a particular employee type.

Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All) - 1.

We have noticed:

Industry data scientists prefer regression algorithms, visualization, statistical algorithms, random forest algorithms, and time series

Government/NPOs prefer to use visualization, principal component analysis, and time series

Academic researchers prefer to use principal component analysis algorithms and deep learning

Students generally use fewer algorithms, but they do more text mining and deep learning

In addition, readers participating in the voting mainly come from

United States/Canada, 40%

Europe, 32%

Asia, 18%

Latin America, 5.0%

Africa/Middle East, 3.4%

Australia/New Zealand, 2.2%

In the 2011 survey, we divided the industry/government into the same group, grouped the academic researchers/students into the second group, and calculated the "fitness" of the industry/government through the algorithm:

N(Alg,Ind_Gov) / N(Alg,Aca_Stu)

------------------------------- - 1

N(Ind_Gov) / N(Aca_Stu)

An algorithm with a degree of affinity of 0 indicates that it is equivalent between industry/government and academic researchers/students. The higher the affinity of IG, the more commonly used by the industry, and the more “academic”.

Among them, the most " industrial " algorithm is:

Incremental model Uplift modeling , 2.01

Anomaly Detection , Anomaly Detection , 1.61

Survival Analysis , 1.39

Factor Analysis Factor Analysis , 0.83

Time Series Time series/Sequences , 0.69

Association Rules Association Rules , 0.5

Incremental model Uplift modeling has once again become the most "industry" algorithm , but surprisingly its use rate is very low - only 3.1%, which is almost the lowest usage rate algorithm in this survey.

The most " academic " algorithm is:

Neural networks Neural networks - regular , -0.35

Naive Bayes Naive Bayes , -0.35

Support Vector Machine SVM , -0.24

Deep Learning Deep Learning , -0.19

EM , -0.17

The following figure shows all the algorithms and their affinity in industry/academia:

Algorithms most commonly used by data scientists. Industry vs. Academia

2016 Data Scientists Using Algorithm Survey Summary

The various meanings in the summary table are:

N: ranked according to usage

Algorithm: algorithm name,

Type: S - Supervision, U - Unsupervised, M - Element, Z - Other,

% refers to the proportion of survey respondents using this algorithm

Change—Change (%2016/2011% - 1),

Industry Affinity-industry affinity (mentioned in the previous article)

This article was compiled by Lei Feng Network (search "Lei Feng Net" public number) , and refused to reprint without permission!

Via KDnuggets

Topics Data Scientists at Work

Flex Coiled Cable Assembly

The TPU materials type raw cable jacket can guarantee longer lifetime and quality. Coiled cable can be used to carry electrical currents as well as data and signal for telecommunications applications. This versatility makes coil cords ideal for use in environments that are often too rough for non-coiled cable.

It has the ability to extend beyond the natural length at rest, which can be a real space-saving feature. These cords are flexible beyond simple extending and retracting in that they can also be pulled, bent, and twisted without experiencing the metal fatigue of a straight cable.


Flex Coiled Cable Assembly, coiled cable harness, flex coiled wiring cable,High Quality Electrical Wire Harness,light-duty coiled cables

ETOP WIREHARNESS LIMITED , https://www.oemmoldedcables.com

Posted on