Инструментарий Data Science & Data Mining с помощью Python (Юрий Кашницкий, Александр Крот)
Программа
Мы постарались сделать программу так, чтобы в нее входили только самые основные навыки, необходимые в реальной работе. Здесь не будет длинной теории, только практически важные вещи.
- Александр Крот проведет курс по прикладным задачам в анализе данных, где будут рассмотрены области Text Mining и Graph Theory, а также будут затронуты вопросы машинного обучения на больших данных
- Юрий Кашницкий, преподаватель Высшей Школы Экономики, умеющий объяснять сложные вещи простым языком, познакомит слушателей сперва с основными инструментами, которые пригодятся начинающему Data Scientist'у, а после проведет курс по машинному обучению, в котором даст необходимые навыки для построения прогнозных моделей
- Далее пройдет эксклюзивный мастер-класс Станислава Семенова, занимающего на данный момент 3е место в мировом рейтинге Kaggle, посвященный применению стратегий при решении задач. Станислав расскажет о таких необычных вещах, как стекинг, блендинг, композиции классификаторов, а также разберет несколько нетривиальных задач
Начиная с прошлого года, нам написало более 500 человек с вопросами по машинному обучению и анализу данных. После этого мы открыли ресурс MLClass.ru и собрали на нем множество специалистов по вопросам машинного обучения. Мы готовим специалистов и стараемся устраивать их на работу в ведущие компании.
- Урок 1
Введение в Python и средства разработки (23 сентября) - Урок 2
Основы языка Python (27 сентября) - Урок 3
Структуры данных I (30 сентября) - Урок 4
Структуры данных II (4 октября) - Урок 5
Функции. Рекурсия (7 октября)
- Урок 1
Основы статистики (11 октября) - Урок 2
Введение в линейную алгебру (14 октября) - Урок 3
Машинное обучение в Python I (18 октября) - Урок 4
Машинное обучение в Python II (21 октября) - Урок 5
Машинное обучение в Python III (25 октября) - Урок 6
Машинное обучение в Python IV (28 октября)
1. Инструментарий Data Science
MLClass 1.mp4 [383m 212k 360]
MLClass 2.mp4 [426m 596k 277]
MLClass 3.mp4 [447m 605k 109]
MLClass 4.mp4 [454m 326k 983]
MLClass 5.mp4 [372m 9k 833]
Notebooks
jupyter_notebooks
README.md [2k 306]
img
anaconda.png [11k 274]
dir_tree.png [39k 703]
for_cycle.png [65k 968]
git_add.png [65k 692]
git_branch_develop.png [139k 370]
git_checkout_file.png [123k 806]
git_conflict.png [47k 85]
git_conflict_resolved.png [39k 786]
git_push.png [86k 432]
github_commits.png [60k 628]
github_new_repo.png [75k 299]
ipython_ex.png [174k 221]
link.png [29k 337]
mccme_task.png [78k 104]
mlclass_logo.jpg [18k 298]
operations.png [182k 512]
operations_priority.png [97k 652]
pycharm_screen.png [235k 904]
qsort_tree.gif [6k 667]
qsort-recur1.png [47k 73]
qsort-recur2.png [3k 143]
smart_git.png [192k 580]
task_5B.png [2k 893]
while_cycle.png [64k 458]
img_html
adjlist.png [28k 141]
adjMat.png [8k 850]
anaconda.png [11k 274]
digraph.png [42k 41]
dijkstra.png [60k 897]
dir_tree.png [39k 703]
fig_2_1_2_1.png [165k 728]
fig_2_1_2_2.gif [3k 160]
fig_2_1_2_3.png [20k 804]
fig_2_1_2_4.png [45k 329]
fig_2_1_2_5.png [102k 880]
for_cycle.png [65k 968]
git_add.png [65k 692]
git_branch_develop.png [139k 370]
git_checkout_file.png [123k 806]
git_conflict.png [47k 85]
git_conflict_resolved.png [39k 786]
git_push.png [86k 432]
github_commits.png [60k 628]
github_new_repo.png [75k 299]
heap1.png [33k 875]
heapadd.jpg [116k 900]
hotpotato.png [49k 135]
ipython_decision_tree.slides.html [539k 908]
ipython_ex.png [174k 221]
lesson1_github.html [207k 82]
lesson1_python_intro_tools.html [226k 86]
lesson1_python_intro_tools.slides.html [235k 113]
lesson2_part1_data_types.html [197k 61]
lesson2_part1_variables_strings_numbers.slides.html [250k 730]
lesson2_part2_conditions.slides.html [228k 570]
lesson2_part2_numbers.html [214k 407]
lesson2_part3_strings.html [220k 911]
lesson2_part3_while_input.slides.html [256k 329]
lesson2_part4_conditions.html [218k 616]
lesson2_part5_while_for.html [247k 487]
lesson2_tasks.html [219k 636]
lesson3_part1_lists_tuples.html [329k 579]
lesson3_part2_lists_tuples.slides.html [338k 286]
lesson3_part2_search_sort.html [218k 598]
lesson3_part3_string_algo.html [227k 254]
lesson3_part4_dictionaries.html [317k 608]
lesson3_part5_reading_file_to_dict.html [199k 727]
lesson3_part6_sets.html [207k 667]
lesson3_tasks.html [235k 130]
lesson4_part1_data_structures.html [290k 354]
lesson4_part2_graph_algo.html [394k 582]
lesson4_tasks.html [233k 122]
lesson5_part1_functions.html [270k 341]
lesson5_part2_recursion.html [220k 926]
lesson5_tasks.html [226k 274]
link.png [29k 337]
mccme_task.png [78k 104]
mlclass_logo.jpg [18k 298]
namequeue.png [29k 509]
operations.png [182k 512]
operations_priority.png [97k 652]
pycharm_screen.png [235k 904]
qsort_tree.gif [6k 667]
qsort-recur1.png [47k 73]
qsort-recur2.png [3k 143]
smalltree.png [21k 722]
smart_git.png [192k 580]
task_5B.png [2k 893]
treedef1.png [27k 195]
treedef2.png [36k 534]
while_cycle.png [64k 458]
ipython_demonstration
fig_2_1_2_1.png [165k 728]
fig_2_1_2_2.gif [3k 160]
fig_2_1_2_3.png [20k 804]
fig_2_1_2_4.png [45k 329]
fig_2_1_2_5.png [102k 880]
ipython_decision_tree.html [534k 989]
ipython_decision_tree.ipynb [327k 993]
mlclass_logo.jpg [18k 298]
python_lesson1_tools
lesson1_optional_github.ipynb [12k 178]
lesson1_part1_python_intro_tools.ipynb [23k 958]
python_lesson2_python_basics
lesson2_part1_data_types.ipynb [6k 420]
lesson2_part2_numbers.ipynb [10k 365]
lesson2_part3_strings.ipynb [19k 544]
lesson2_part4_conditions.ipynb [9k 924]
lesson2_part5_while_for.ipynb [28k 5]
lesson2_tasks.ipynb [12k 687]
python_lesson3_data_structures1
credit_sample.txt [180]
lesson3_part1_lists_tuples.ipynb [51k 687]
lesson3_part2_search_sort.ipynb [13k 878]
lesson3_part3_string_algo.ipynb [16k 424]
lesson3_part4_dictionaries.ipynb [64k 873]
lesson3_part5_reading_file_to_dict.ipynb [5k 143]
lesson3_part6_sets.ipynb [8k 921]
lesson3_tasks.ipynb [16k 655]
python_lesson4_data_structures2
input.txt [787]
lesson4_part1_data_structures.ipynb [52k 639]
lesson4_part2_graph_algo.ipynb [164k 65]
lesson4_tasks.ipynb [18k 841]
python_lesson5_func_recursion
lesson5_part1_functions.ipynb [48k 517]
lesson5_part2_recursion.ipynb [16k 3]
lesson5_tasks.ipynb [15k 953]
tasks
lesson2_tasks
2A_3443_power_of_two.py [16]
2B_factorial.py [98]
2C_hypo.py [66]
2D_100A.py [17]
2E_3501_max_of_2_integers.py [59]
2F_which_is_greater.py [100]
2G_squared.py [28]
2H_hypo.py [80]
2I_max_of_three.py [226]
2J_trian_exists.py [153]
2K_ladja.py [146]
2L_root10.py [60]
2M_3504_leap_year.py [124]
2N_3513_horse.py [276]
lesson3_tasks
3A_range.py [71]
3B_sum_squares.py [72]
3C_factorial.py [67]
3D_n_choose_k.py [172]
3E_penguins.py [205]
3F_choco.py [137]
3G_linear.py [151]
3H_cows.py [275]
3I_diofant.py [182]
3J_magic_numbers.py [68]
3K_stairs.py [97]
3L_three_comparisons.py [158]
3M_metro.py [774]
3N_sum_factorials.py [202]
lesson4_tasks
4A_only_even.py [71]
4B_simple_word_count.py [43]
4C_swap.py [56]
4D_del_fragment.py [59]
4E_swap_neighbours.py [159]
4F_reverse_fragment.py [85]
4G_insert_char_delimiter.py [53]
4H_swap_min_max.py [157]
4I_num_unique.py [48]
4J_num_same.py [116]
4K_occured_before.py [142]
4L_num_unique_words.py [195]
4M_boxes.py [350]
4N_polyglots.py [450]
4O_file_word_count.py [362]
input.txt [787]
lesson5_tasks
5A_min4.py [153]
5B_in_square.py [161]
5C_power.py [151]
5D_prime.py [267]
5E_combinations.py [183]
5F_recur_sum.py [200]
5G_reverse.py [428]
Тетрадки IPython в pdf
ipython_decision_tree.pdf [1m 178k 900]
lesson1_github.pdf [505k 771]
lesson1_python_intro_tools.pdf [709k 477]
lesson2_part1_data_types.pdf [139k 885]
lesson2_part2_numbers.pdf [303k 227]
lesson2_part3_strings.pdf [209k 453]
lesson2_part4_conditions.pdf [215k 135]
lesson2_part5_while_for.pdf [354k 449]
lesson2_tasks.pdf [348k 859]
lesson3_part1_lists_tuples.pdf [353k 183]
lesson3_part2_lists_tuples slides.pdf [369k 722]
lesson3_part2_search_sort.pdf [197k 129]
lesson3_part3_string_algo.pdf [222k 970]
lesson3_part4_dictionaries.pdf [425k 161]
lesson3_part5_reading_file_to_dict.pdf [146k 16]
lesson3_part6_sets.pdf [139k 266]
lesson3_tasks.pdf [806k 62]
lesson4_part1_data_structures.pdf [1m 172k 358]
lesson4_part2_graph_algo.pdf [1m 249k 789]
lesson4_tasks.pdf [638k 473]
lesson5_part1_functions.pdf [262k 437]
lesson5_part2_recursion.pdf [231k 751]
lesson5_tasks.pdf [925k 67]
2. Data Mining с помощью Python
Machine learning with Python 1.mp4 [508m 843k 559]
Machine learning with Python 2.mp4 [474m 657k 401]
Machine learning with Python 3.mp4 [460m 236k 363]
Machine learning with Python 4.mp4 [518m 647k 120]
Machine learning with Python 5_1.mp4 [346m 176k 451]
Machine learning with Python 5_2.mp4 [184m 718k 734]
Machine learning with Python 5_3.mp4 [61m 546k 33]
Machine learning with Python 6.mp4 [434m 108k 593]
jupyter_notebooks
data
beauty.csv [32k 368]
car_insurance_test.csv [3k 273]
car_insurance_test_labels.csv [503]
car_insurance_train.csv [28k 980]
ex2data1.txt [3k 775]
ex2data2.txt [2k 233]
girls.csv [17k 63]
hostel_factors.csv [2k 873]
microchip_tests.txt [2k 233]
nba_2013.csv [72k 21]
pima-indians-diabetes.data [23k 279]
rf_prediction.csv [504]
sample_submission.csv [503]
samsung_test.txt [26m 458k 166]
samsung_test_labels.txt [5k 894]
samsung_train_labels.txt [14k 704]
test_input.txt [34]
titanic_test.csv [28k 629]
titanic_train.csv [61k 194]
tree_prediction.csv [504]
img
anaconda.png [11k 274]
bagging.png [123k 500]
boosting_overfitting.png [109k 479]
classifiers.png [557k 953]
confusion_matrix.png [72k 50]
contingency.png [50k 163]
decision_tree1.png [165k 728]
decision_tree2.gif [3k 160]
decision_tree3.png [20k 804]
decision_tree4.png [45k 329]
decision_tree5.png [102k 880]
dir_tree.png [39k 703]
first_tree.gif [125k 389]
forest.png [51k 216]
gboost_cv-test_acc_car.png [30k 625]
girl1.jpg [166k 868]
girl2.jpg [78k 296]
girl3.jpg [144k 495]
girl4.jpg [73k 414]
girl5.jpg [139k 728]
girl6.jpg [115k 881]
girl7.jpg [125k 748]
git_add.png [65k 692]
git_branch_develop.png [139k 370]
git_checkout_file.png [123k 806]
git_conflict.png [47k 85]
git_conflict_resolved.png [39k 786]
git_push.png [86k 432]
github_commits.png [60k 628]
github_new_repo.png [75k 299]
ipython_ex.png [174k 221]
ipython-logo.jpg [3k 378]
kernel_trick.jpeg [48k 514]
kfold.jpg [9k 815]
kNN.png [140k 453]
knn_cv-test_acc_car_insurance.png [17k 67]
linalg_task.png [25k 516]
linalg_task2.png [26k 915]
linalg_task3.png [46k 385]
locally_best_tree.gif [8k 146]
logit.png [29k 386]
matplotlib-logo.png [91k 776]
mlclass_logo.jpg [18k 298]
mlclass_logo2.jpg [21k 339]
motivation.png [711k 966]
numpy-logo.png [6k 48]
outlier_detection.png [105k 440]
pandas-logo.png [9k 239]
plot_pca_3d_1.png [30k 408]
plot_pca_3d_2.png [29k 681]
prime-sieve.png [31k 218]
pycharm_screen.png [235k 904]
ROC.jpg [133k 428]
scikit-learn-flow-chart.jpg [200k 518]
scikit-learn-logo.png [13k 662]
scipy-logo.png [1k 439]
smart_git.png [192k 580]
svm_linear2.png [16k 190]
svm_linear3.png [13k 195]
SVM_optimize.png [20k 537]
tree-partition.png [46k 812]
tree-simple.png [35k 888]
trigonometry.png [125k 489]
ml_lesson1_intro
first_tree.dot [4k 900]
first_tree.pdf [22k 455]
lesson1_part1_intro.ipynb [8k 581]
lesson1_part2_decision_trees.ipynb [328k 108]
lesson1_part3_kaggle_inclass.ipynb [385k 831]
locally_best_tree.dot [275]
tree_prediction.csv [504]
ml_lesson2_tools
hw2_pandas_titanic.ipynb [34k 257]
lesson2_linalg_task.ipynb [13k 152]
lesson2_optional_github.ipynb [12k 64]
lesson2_part1_numpy.ipynb [245k 908]
lesson2_part2_scipy.ipynb [85k 787]
lesson2_part3_pandas.ipynb [262k 207]
lesson2_part4_matplotlib.ipynb [257k 718]
lesson2_part5_Seaborn.ipynb [184k 866]
rf_titanic.ipynb [52k 985]
Titanic_pandas_english.ipynb [633k 433]
ml_lesson3_classification
hw2_pandas_Titanic_solution.ipynb [169k 316]
lesson3_part1_scikit-learn_overview.ipynb [358k 521]
lesson3_part2_feature_extraction_Titanic.ipynb [110k 840]
lesson3_part3_feature_importance.ipynb [50k 348]
lesson3_part4_k_nearest_neighbors.ipynb [68k 962]
lesson3_part5_logistic_regression.ipynb [108k 449]
lesson3_part6_classification_metrics.ipynb [69k 944]
lesson3_part7_SVM_kernel_trick.ipynb [549k 610]
lesson4_part4_kNNlearning_curve.ipynb [66k 810]
Untitled.ipynb [29k 77]
ml_lesson4_ensembles_regularization
lesson4_AdaBoost_validation.ipynb [40k 129]
lesson4_hw.ipynb [9k 648]
lesson4_kNN_validation.ipynb [130k 841]
lesson4_logit_validation.ipynb [115k 0]
lesson4_part1_bagging.ipynb [110k 625]
lesson4_part2_random_forest.ipynb [90k 844]
lesson4_part3_boosting.ipynb [144k 477]
lesson4_part4_ensemble_comparison.ipynb [178k 286]
lesson4_part5_overfitting_validation.ipynb [289k 933]
lesson4_part6_regularization.ipynb [242k 282]
load_car_insurance_data.py [1k 953]
__pycache__
load_car_insurance_data.cpython-34.pyc [1k 467]
ml_lesson5_unsupervised
hw5_clustering_samsung_solution.ipynb [170k 714]
lesson5_part1_kmeans.ipynb [310k 189]
lesson5_part2_PCA.ipynb [1m 304k 877]
lesson5_part3_outlier_detection.ipynb [115k 505]
lesson5_part4_habr_girls.ipynb [248k 601]
lesson5_part5_clustering_metrics.ipynb [13k 398]
ml_lesson6_classes
best_boston.pkl [110k 575]
kaggle_otto_semenov.ipynb [1m 531k 621]
lesson6_part1_lasagne_otto.ipynb [13k 589]
lesson6_part2_xgboost_example.ipynb [10k 893]
lesson6_part2_xgboost_scikit_gboost.ipynb [105k 363]
lesson6_part3_kaggle_ensembles.ipynb [6k 856]
lesson6_part4_kaggle_titanic_blending_auc.ipynb [350k 910]
lesson6_part5_titanic_blending_f1_score.ipynb [188k 847]
lesson6_part6_kaggle_titanic_stacking_auc.ipynb [20k 916]
lesson6_part7_custom_estimator.ipynb [6k 359]
lesson6_part8_knn_custom_metric.ipynb [8k 172]
load_car_insurance_with_region.py [2k 167]
load_car_insurance_with_region.pyc [1k 770]
load_titanic_with_features.py [4k 816]
load_titanic_with_features.pyc [3k 734]
__pycache__
load_titanic_with_features.cpython-34.pyc [3k 396]
other_notebooks
2_1_5_logit.ipynb [6k 613]
An introduction to Machine Learning with Scikit-Learn.ipynb [38k 497]
beeline_tselikov.ipynb [171k 65]
dataschool_logit.ipynb [90k 242]
lesson3_titanic_tutorial_eng.ipynb [439k 173]
lesson6_part8_custom_estimator_car_insurance.ipynb [21k 819]
lesson6_part8_custom_estimator2_car_insurance.ipynb [21k 819]
scikit-learn-validation.ipynb [284k 332]
Titanic_pandas_english.ipynb [633k 433]
output
adaboost_car_insurance.csv [504]
car_insurance_myblackbox.csv [504]
knn_car_insurance.csv [504]
knn_car_insurance_custom_metric.csv [504]
lasagne_otto.csv [22m 939k 883]
titanic.knn.csv [2k 839]
titanic_knn.csv [2k 839]
titanic_knn_f1.csv [2k 839]
titanic_knn_lin_svc_mix.csv [2k 839]
titanic_knn_lin_svc_mix_f1.csv [2k 839]
titanic_lin_svc.csv [2k 839]
titanic_lin_svc_f1.csv [2k 839]
titanic_logit_poly.csv [2k 839]
titanic_myblackbox.csv [2k 839]
titanic_results-rf_tutorial_07799.csv [2k 839]
titanic_rf_prediction.csv [2k 839]
titanic_rf_prediction_with_title.csv [2k 839]
titanic_scaled_logit_poly.csv [2k 839]
titanic_stacking.csv [2k 839]
titanic_xgb_submission.csv [2k 839]
submissions
gboost_car_insurance 2.csv [504]
gboost_car_insurance.csv [504]
knn_car_insurance 2.csv [504]
knn_car_insurance.csv [504]
rf_cv_prediction.csv [504]
rf_prediction.csv [504]
tree_prediction.csv [504]
scripts
load_titanic_with_features.py [4k 816]
load_titanic_with_features.pyc [3k 822]
3. Kaggle Tips & Tricks
Kaggle_Tips_and_Tricks_1.mp4 [213m 837k 262]
Kaggle_Tips_and_Tricks_2.mp4 [343m 12k 168]
Kaggle_Tips_and_Tricks_3.mp4 [332m 812k 445]
Kaggle_Tips_and_Tricks_4.mp4 [294m 56k 95]
Kaggle_Tips_and_Tricks_5.mp4 [324m 137k 596]
Kaggle Tips and Tricks
1
1.pdf [1m 16k 689]
2
2.pdf [503k 519]
greek.ipynb [854k 176]
3
3.pdf [863k 266]
otto.ipynb [1m 529k 336]
4
4.pdf [793k 142]
axa.ipynb [4m 612k 221]
5
5.pdf [442k 371]
cat.ipynb [503k 993]
4. Прикладные области анализа данных
Lesson_1._Practical_Data_Science.mp4 [357m 687k 852]
Lesson_2._Practical_Data_Science.mp4 [364m 565k 309]
Lesson_3._Practical_Data_Science.mp4 [413m 115k 228]
Lesson_4._Practical_Data_Science.mp4 [418m 581k 286]
Practical Data Science
Lesson 1
DSCourse.pdf [3m 759k 520]
Kaggle-Word2Vec.html [303k 98]
Kaggle-Word2Vec.ipynb [92k 256]
texts.pdf [454k 305]
TopicModelling2.html [351k 79]
TopicModelling2.ipynb [131k 882]
Lesson 2
d3js VS shiny.ipynb [6k 447]
DSCourse.pdf [3m 759k 520]
graph_tool.ipynb [6k 746]
graphs.pdf [146k 651]
igraph.ipynb [7k 824]
networkx.ipynb [18k 76]
Lesson 3_4
Basics.html [213k 829]
Basics.ipynb [22k 955]
DocumentSimilarity.html [331k 524]
DocumentSimilarity.ipynb [128k 465]
DSCourse.pdf [3m 759k 520]
features_selection.pdf [330k 640]
ibm_meetup.pdf [506k 83]
Krot_graphs_viz_2015.pdf [233k 710]
Krot_graphs4_2015.pdf [91k 303]
Krot_likes_2015.pdf [628k 436]
Krot_PCA_2015.pdf [297k 943]
Link_prediction.pdf [345k 899]
MLLib.html [238k 197]
MLLib.ipynb [39k 371]
networkx.html [227k 139]
networkx.ipynb [12k 506]
Объем: 7,69Гб.