Data Science

Data Science

In a nutshell, it is ‘data-driven’ science which includes analyzing the data and drawing insights out of it in order to make a process of decision making simpler. A data scientist will have to examine data from all angles, reshape it according to the need, remove unwanted data from the mixed pool that we get, just like a sculptor carves out a beautiful idol from a piece of wood. Moreover, adding machine learning models into it empowers it to another level. So, using ML models become a crucial part of driving useful insights out of raw data.

What a data scientist does

A data scientist collects, analyzes, and interprets large volumes of data, in many cases, to improve a company’s operations. Data scientist professionals develop statistical models that analyze data and detect patterns, trends, and relationships in data sets. This information can be used to predict consumer behavior or to identify business and operational risks. The data scientist is often a storyteller presenting data insights to decision makers in a way that is understandable and applicable to problem-solving.The Harvard Business Review published an article in 2012 describing the role of the data scientist as the “sexiest job of the 21st century.”

PREREQUISITES

  • Basic programming knowledge(preferrable python)
  • Basic mathematical knowledge, knowledge in statistics preferred.
  • Notebook environment - Google Colab/ Notebook. In later stages you will need an IDE - VS Code/ PyCharm , Anaconda distribution preferred
  • A thrift to come up with as well as solve real life problems will aid you a lot in this field.

KEY RESOURCES

Kaggle is the best resource . It is the one stop shop for data science aspirants.

  • Tutorials
  • Solved problems and proper documentations
  • Free datasets
  • Competitions Dataquest.io , KDnuggets are also awesome places to explore.

Youtubers - Ken Jee (If you can start 66 days of data, its one of the best ways to learn), Krish Naik, Code basics, ritvikmath

The best way to learn Data Science is to take up a real world problem/ project (even if solution exists) and understand the need of each step given below and reverse engineer. You can start with these.

Project 1: House Prices Regression

Project 2: Titanic Classification

Project 3: Deep Learning Number Recognition

Main Topics

  • Data Cleaning, Data Visualization

    • Libraries like Numpy, Pandas, Matplotlib, Seaborn, Scipy
    • Softwares like Plotly, tableau

Courses

Applied Data Science with Python Specialization , Data Analysis Using NumPy and Pandas

Youtube

Youtube-Data Analysis with Python - Full Course for Beginners https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS, https://www.youtube.com/playlist?list=PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_

Machine Learning

Please visit the Learning path for ML which is already made. When you come across each model, try to understand the math behind it. After exploring it, you can go into the deep learning learning path.

Statistics

Courses - Edx course,How to Learn Statistics for Data Science As A Self Starter Youtube - Statquest is an amazing channel to check out, Ritvikmath also has awesome intuitive explanations

Data Sources

UCL data , Kaggle datasets , Google Dataset Search

Youtube course - Web Scraping with Python - Beautiful Soup Crash Course

Model Deplyment

  • Django - Python backend framework to deploy (mostly largescale compared to flask, and it has wider options) and create web APIs
  • Flask - Also backend python framework, but can be quickly grasped than django..
  • Streamlit - is a comparatively easier framework to learn.
  • Heroku is a general hosting platform as a service

Blogs

https://towardsdatascience.com/ https://medium.com/kaggle-blog https://ryanswanstrom.com/datascience101/

Curated by Niranjan Neelakantan , TLF Python facilitator