Introduction to Data Analysis – Part I

The following is my attempt to summarize the first chapter of the book, Python Data Analytics by Fabio Nelli.

– E.C. De Dios

According to Merriam-Webster, data is “factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation.” I usually just think of it is as anything that can be recorded or measured.

In the book, Fabio makes the distinction that “data actually are not information” and that “information is actually the result of processing.” He then proclaims that data analysis is the “process of extracting information from raw data.”

Data Analysis

“Data analysis allows you to forecast possible responses of systems and their evolution in time.” Its aim is not the mathematical models themselves but the quality of the its predictive power.

The search for data, their extraction, and preparation are also part of the data analysis process because of their importance in the critical role and influence in the success of the results.

All stages of data analysis employ different techniques of data visualizations. It’s all about the charts!

Knowledge Domains of the Data Analyst

Fabio also points out that data analysis is a multi-disciplinary field and is “well suited to many professional activities. He adds, “a good data analyst must be able to move and act in many different disciplinary areas.”

Not only is it necessary to know other disciplines, it is also imperative that a data analyst know “how to search not only for data, but also for information on how to treat that data.”

Computer Science

Knowledge of information technology is necessary to know how to use the various tools like applications and programming languages which in turn are needed to perform data analysis and visualization.

Mathematics and Statistics

Data analysis requires a lot of complex math. Statistics form the concepts that form the basis of data analysis. Bayesian methods, regression, and clustering are just some of the most commonly used techniques in data analysis.

Machine Learning and Artificial Intelligence

Machine learning analyzes data in order to recognize patterns, cluster, or trends and then extracts useful information in an automated way.

Professional Fields of Application

Better understanding of where the data comes from greatly improves their interpretation. It is good practice to find consultants to whom you can pose the right questions about your data.

Types of Data

Data is divided into two distinct categories:

  • Categorical (nominal and ordinal)
  • Numerical (discrete and continuous)

Categorical data are observations that can be divided into groups or categories. Nominal variables has no intrinsic order while ordinal variables has a predetermined order.

Numerical data are measured observations. Discrete variables can be counted while continuous values assume any value within a defined range.

Next in part II, we will explore the process of data analysis in detail.

Published by

Ednalyn C. De Dios

I’ve always been enamored with code and I love data science because of its inherent power to solve real problems. Having grown up in the Philippines, served in the United States Navy, and worked in the nonprofit sector, I am driven to make the world a better place. I have started and participated in numerous campaigns that aim to reduce domestic violence and child abuse in the community.

2 thoughts on “Introduction to Data Analysis – Part I”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.