The following is my attempt to summarize the first chapter of the book, Python Data Analytics by Fabio Nelli.
– E.C. De Dios
According to Merriam-Webster, data is “factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation.” I usually just think of it is as anything that can be recorded or measured.
In the book, Fabio makes the distinction that “data actually are not information” and that “information is actually the result of processing.” He then proclaims that data analysis is the “process of extracting information from raw data.”
Data Analysis
“Data analysis allows you to forecast possible responses of systems and their evolution in time.” Its aim is not the mathematical models themselves but the quality of the its predictive power.
The search for data, their extraction, and preparation are also part of the data analysis process because of their importance in the critical role and influence in the success of the results.
All stages of data analysis employ different techniques of data visualizations. It’s all about the charts!
Knowledge Domains of the Data Analyst
Fabio also points out that data analysis is a multi-disciplinary field and is “well suited to many professional activities. He adds, “a good data analyst must be able to move and act in many different disciplinary areas.”
Not only is it necessary to know other disciplines, it is also imperative that a data analyst know “how to search not only for data, but also for information on how to treat that data.”
Computer Science
Knowledge of information technology is necessary to know how to use the various tools like applications and programming languages which in turn are needed to perform data analysis and visualization.
Mathematics and Statistics
Data analysis requires a lot of complex math. Statistics form the concepts that form the basis of data analysis. Bayesian methods, regression, and clustering are just some of the most commonly used techniques in data analysis.
Machine Learning and Artificial Intelligence
Machine learning analyzes data in order to recognize patterns, cluster, or trends and then extracts useful information in an automated way.
Professional Fields of Application
Better understanding of where the data comes from greatly improves their interpretation. It is good practice to find consultants to whom you can pose the right questions about your data.
Types of Data
Data is divided into two distinct categories:
- Categorical (nominal and ordinal)
- Numerical (discrete and continuous)
Categorical data are observations that can be divided into groups or categories. Nominal variables has no intrinsic order while ordinal variables has a predetermined order.
Numerical data are measured observations. Discrete variables can be counted while continuous values assume any value within a defined range.
Next in part II, we will explore the process of data analysis in detail.
2 thoughts on “Introduction to Data Analysis – Part I”