Newly released - Ten most commonly used algorithms for data scientists

Introduction: This article is based on the top ten algorithmic surveys conducted by KDnuggets, which ranks the algorithms commonly used by data engineers and describes their changes in 2011-2016.

Based on the survey, KDnuggets summarized the ten most commonly used data scientists, which are:

1. Regression regression algorithm

2. Clustering clustering algorithm

3. Decision Trees/Rules decision tree

4. Visualization visualization

5. k-Nearest Neighbor Proximity Algorithm

6. PCA (Principal Component Analysis) Principal Component Analysis Algorithm

7. Statistics Statistics Algorithm

8. Random Forests random forest algorithm

9. Time series/Sequence time series

10. Text Mining text mining

Among them, respondents said that an average of 8.1 algorithms were used, a substantial increase compared to similar surveys in 2011.

Compared with similar surveys in 2011, we found that the most popular algorithms are regression algorithms, clustering algorithms, decision trees, and visualization . The most significant increase is the following algorithm determined by (pct2016/pct2011 - 1):

Boosting , from 23.5% in 2011 to 32.8% in 2016, up 40% year-on-year

Text mining , from 27.7% in 2011 to 35.9% in 2016, a year-on-year increase of 30%

Visualization , from 38.3% in 2011 to 48.7% in 2016, a year-on-year increase of 27%

Time series , from 29.6% in 2011 to 37.0% in 2016, up 25% year-on-year

Abnormal/deviation detection , from 16.4% in 2011 to 19.5% in 2016, a year-on-year increase of 19%

Integrated approach , from 28.3% in 2011 to 33.6% in 2016, a year-on-year increase of 19%

Support vector machines , from 28.6% in 2011 to 33.6% in 2016, a year-on-year increase of 18%

Regression algorithm , from 57.9% in 2011 to 67.1% in 2016, a year-on-year increase of 16%

In addition, the most popular new algorithms in 2016 are:

K-nearest neighbor , 46%

Principal component analysis , 43%

Random Forest Algorithm , 38%

Optimization , 24%

Neural Network - Deep Learning , 19%

Singular value decomposition , 16%

The most significant declines are:

Association rules , from 28.6% in 2011 to 15.3% in 2016, a decrease of 47% year-on-year

Incremental model , from 4.8% in 2011 to 3.1% in 2016, a year-on-year decrease of 36%

Factor analysis , from 18.6% in 2011 to 14.2% in 2016, a year-on-year decrease of 24%

Survival analysis , from 9.3% in 2011 to 7.9% in 2016, a year-on-year decrease of 15%

The proportion of algorithms used in different fields

We noticed that almost everyone is using supervised learning algorithms .

Government and industry data scientists use more different algorithms than students or academia, and industrial data scientists prefer meta-algorithms.

Below, we continue to analyze the most popular 10 algorithms and deep learning through the types of employees.

To make these differences easier to see, we designed an algorithm for the average algorithm usage associated with a particular employee type.

Bias(Alg,Type)=Usage(Alg,Type)/Usage(Alg,All) - 1.

We have noticed:

Industry data scientists prefer regression algorithms, visualization, statistical algorithms, random forest algorithms, and time series

Government/NPOs prefer to use visualization, principal component analysis, and time series

Academic researchers prefer to use principal component analysis algorithms and deep learning

Students generally use fewer algorithms, but they do more text mining and deep learning

In addition, readers participating in the voting mainly come from

United States/Canada, 40%

Europe, 32%

Asia, 18%

Latin America, 5.0%

Africa/Middle East, 3.4%

Australia/New Zealand, 2.2%

In the 2011 survey, we divided the industry/government into the same group, grouped the academic researchers/students into the second group, and calculated the "fitness" of the industry/government through the algorithm:

N(Alg,Ind_Gov) / N(Alg,Aca_Stu)

------------------------------- - 1

N(Ind_Gov) / N(Aca_Stu)

An algorithm with a degree of affinity of 0 indicates that it is equivalent between industry/government and academic researchers/students. The higher the affinity of IG, the more commonly used by the industry, and the more “academic”.

Among them, the most " industrial " algorithm is:

Incremental model Uplift modeling , 2.01

Anomaly Detection , Anomaly Detection , 1.61

Survival Analysis , 1.39

Factor Analysis Factor Analysis , 0.83

Time Series Time series/Sequences , 0.69

Association Rules Association Rules , 0.5

Incremental model Uplift modeling has once again become the most "industry" algorithm , but surprisingly its use rate is very low - only 3.1%, which is almost the lowest usage rate algorithm in this survey.

The most " academic " algorithm is:

Neural networks Neural networks - regular , -0.35

Naive Bayes Naive Bayes , -0.35

Support Vector Machine SVM , -0.24

Deep Learning Deep Learning , -0.19

EM , -0.17

The following figure shows all the algorithms and their affinity in industry/academia:

Algorithms most commonly used by data scientists. Industry vs. Academia

2016 Data Scientists Using Algorithm Survey Summary

The various meanings in the summary table are:

N: ranked according to usage

Algorithm: algorithm name,

Type: S - Supervision, U - Unsupervised, M - Element, Z - Other,

% refers to the proportion of survey respondents using this algorithm

Change—Change (%2016/2011% - 1),

Industry Affinity-industry affinity (mentioned in the previous article)

This article was compiled by Lei Feng Network (search "Lei Feng Net" public number) , and refused to reprint without permission!

Via KDnuggets

Topics Data Scientists at Work

Custom Cable Assembly

We provide wire harness manufacturing services for cable harnesses and built-to-print cables used in many industries, such as computer, game machine, POS machine, ATM , audio/video, electro-mechanical, data communications, telecommunications, medical, etc.

Related Products:speaker cable,customized cable assemblies,electrical cable assemblies.

We have developed a tradition of high-tech engineering, prototyping, and quality custom cable manufacturing at very competitive pricing. Also with professional flow chart (wire cutting-stripping-copper twisting-crimping-crimping 100% inspection-soldering-molding-asssembling-testing-FQC100% -OQC1-OQC2) , which can help us support customers with stable quality.


Custom Wire Harness Assembly

  • Electrified dimensional build boards with 100% continuity test
  • Capabilities to test for fuse,diode,resistor, and relay presence
  • Mating test fixtures for lower production wire harnesses
  • Ability to free hand build harnesses for prototypes and design validation
  • Separate layout boards for addition of fir tree clips, rosebuds, clamps,and labels after coverings added
  • Capabilities to add board interlocks and test markings to harness as needed

Speaker Cable,Straight Bnc Cable Assembly,Idc Red Custom Cable Assembly,Customized Cable Assemblies,EV Cable Assemblies,Cable Assembly,Custom Cable,Customized Cable Assemblies

ETOP WIREHARNESS LIMITED , https://www.oemmoldedcables.com

Posted on