To get started in the field of Big Data Science, there are primarily two roles:
1) Big Data Engineer
2) Data Scientist
Big Data Engineer: This profile deals with data engineering things like ETL, Analytic, and dealing with tools like Hadoop, NoSQL, Spark, Tez, and other tech stack. (Of course one cannot learn all things)
Data Scientist: This profile deals with in depth study of data. Primary areas are core concepts like Linear Algebra, Probability, Statistics, and Machine Learning. The target is to delve at depth, play around the data, try to look at the data from various dimensions, apply various possibilities to get meaning of the data in terms of trends or anything else.
You also need to be able to use some means to make sharp and meaningful visualizations.
E.g. Python Scikit provides you API for making Vis of math stuff and other basic stats Vis
You also need to get familiarize with latest Vis tool that can depict the data in the best possible way. In short it is an ART.
Above two areas are very core, of course you can get job in both areas individually, but if you want to really enjoy and explore the big data using core math, you have to get familiarize with both.
NOTE: Learn one scripting language very well (Python Preferred)
For novice people, learn about Hadoop and its tech stack slowly:
- Hadoop - HDFS, MapReduce
- Pig
- Hive
- Oozie
- Sqoop
- Flume
- Spark
- Tez
at the same time, learn slowly about Data Science Core things:
- Linear Algebra
- Statistics
- Probability
then get on Machine Learning
Considering you have good understanding and hands on of above stuff, you need to get more things in bucket by merging both sources e.g. Predictive Data Analytics.
Application of Data Science on Big Data is Valuable stuff for lot of very important industries like
Business - Well, Business Analyst are already there, but their analysis is nowhere near with above mentioned stuff. But, yes, you can become man of paramount importance if you could apply the above skills to business analysis. And for that you must have more stuff in bucket regarding what areas Business focuses more. Your analytics should be inclined toward those areas which can benefit the company in fastest and more meaningful way. Finally, you need to know the points of business.
Government: To empower infrastructure development, such serious analysis can be effectively used.
There are many more, to find the trends, stats, secrets, etc
Scientific Labs: In addition you have to know the domain knowledge e.g. phsyics, chemistry, astronomy, etc., whichever is relevant.
History, Geography, and many more, where you have got the Big Data, you have great chances to apply the skills and make meaningful results.
But, for all the applications, one has to know the basic domain knowledge, because without it you can make decisions intuitively. There are points in the whole process, where you just add or remove stuff based on the your choices for which domain knowledge is important.
1) Big Data Engineer
2) Data Scientist
Big Data Engineer: This profile deals with data engineering things like ETL, Analytic, and dealing with tools like Hadoop, NoSQL, Spark, Tez, and other tech stack. (Of course one cannot learn all things)
Data Scientist: This profile deals with in depth study of data. Primary areas are core concepts like Linear Algebra, Probability, Statistics, and Machine Learning. The target is to delve at depth, play around the data, try to look at the data from various dimensions, apply various possibilities to get meaning of the data in terms of trends or anything else.
You also need to be able to use some means to make sharp and meaningful visualizations.
E.g. Python Scikit provides you API for making Vis of math stuff and other basic stats Vis
You also need to get familiarize with latest Vis tool that can depict the data in the best possible way. In short it is an ART.
Above two areas are very core, of course you can get job in both areas individually, but if you want to really enjoy and explore the big data using core math, you have to get familiarize with both.
NOTE: Learn one scripting language very well (Python Preferred)
For novice people, learn about Hadoop and its tech stack slowly:
- Hadoop - HDFS, MapReduce
- Pig
- Hive
- Oozie
- Sqoop
- Flume
- Spark
- Tez
at the same time, learn slowly about Data Science Core things:
- Linear Algebra
- Statistics
- Probability
then get on Machine Learning
Considering you have good understanding and hands on of above stuff, you need to get more things in bucket by merging both sources e.g. Predictive Data Analytics.
Application of Data Science on Big Data is Valuable stuff for lot of very important industries like
Business - Well, Business Analyst are already there, but their analysis is nowhere near with above mentioned stuff. But, yes, you can become man of paramount importance if you could apply the above skills to business analysis. And for that you must have more stuff in bucket regarding what areas Business focuses more. Your analytics should be inclined toward those areas which can benefit the company in fastest and more meaningful way. Finally, you need to know the points of business.
Government: To empower infrastructure development, such serious analysis can be effectively used.
There are many more, to find the trends, stats, secrets, etc
Scientific Labs: In addition you have to know the domain knowledge e.g. phsyics, chemistry, astronomy, etc., whichever is relevant.
History, Geography, and many more, where you have got the Big Data, you have great chances to apply the skills and make meaningful results.
But, for all the applications, one has to know the basic domain knowledge, because without it you can make decisions intuitively. There are points in the whole process, where you just add or remove stuff based on the your choices for which domain knowledge is important.
No comments:
Post a Comment