"Data! Data! Data!... I can’t make bricks without clay!” – Sherlock Holmes
Data is everywhere, and without data, we can’t solve any problems. Data is a collection of facts, including numbers, text, images, videos, and audio. Analysis involves identifying and defining problems and then solving them using data. Therefore, data analysis is the collection, transformation, and organization of data to draw conclusions, make predictions, and drive informed decisions. The person who performs data analysis is called a data analyst. Data analysts typically use data to make data-driven decisions, which means using facts to guide business strategy.
To become a data analyst, one must be a versatile thinker, capable of thinking in different ways. Analytical thinking, critical thinking, and creative thinking combined is known as versatile thinking. Asking the right questions is critical thinking, finding solutions is creative thinking, and analyzing data is analytical thinking. It is important that as a data analyst, you find the root cause of the problem, possibly using the 5 Whys technique, where you ask "why" five times until you reach the root cause of the problem.
Example:
My company’s strawberry jam sales are low. Why? Because the price of strawberry jam is high. Why? Because the price of strawberries used to make the jam is high. Why? Because strawberries can only be grown in specific conditions.
Asking the why question like this leads us to the root cause.
Another important concept is gap analysis. This involves understanding where you are now and where you want to be, then asking why you are here.
Data analysis is often related to data science. In data science, raw data is used to create models using machine learning (ML) to understand unknown data, whereas in data analysis, we analyze existing data. Different techniques can be employed to analyze data, such as data visualization, which graphically represents data to easily draw conclusions and understand relationships.
Example:
Data visualization can turn a complex dataset into an easily understandable chart or graph. For instance, a sales dashboard in Tableau can show monthly sales trends and highlight which products are performing well.
A data analyst should have the following analytical skills:
Curiosity: Wanting to learn new things.
Understanding the context: Understanding the conditions in which something is happening and the root cause.
Having a technical mindset: The ability to break things down into smaller steps or pieces and work with them in an orderly and logical way.
Data design: Organizing the information.
Data strategy: Managing the people, processes, and tools used for analysis.
In a data ecosystem, elements interact with one another to produce, manage, store, organize, analyze, and share data. Data in a project also has a life cycle called the data life cycle:
Planning: The first phase where initial decisions are made, such as the data required, who will collect and manage it, and who will lead the project.
Capture: The second phase where data collection happens. The required data can be collected from outside sources or company databases.
Manage: Storing the data and using tools to keep it safe.
Analyze: Using data to solve problems and make business decisions.
Archive: Storing data in a place where it is available but may not be used regularly.
Destroy data: Using data eraser software to delete data permanently from storage.
Similarly, there are phases in the data analysis process:
Ask: Define the problem to be solved and understand stakeholder expectations.
Prepare: Collect and store data used for analysis.
Process: Identify and eliminate any errors that may get in the way of analysis.
Analyze: Transform data using tools to make predictions and drive informed decision-making.
Share: Share the insights found with the stakeholders.
Act: Solve the problem by taking action.
For data analysis, we use tools such as spreadsheets, databases, and visualization tools. For databases, SQL is used for querying or requesting information. Spreadsheets such as Excel and Google Sheets are used to organize data. Visualization tools such as Tableau and Looker help present data graphically.
The quality of data forms the basis of the analysis and the fairness of decisions made as a result. Therefore, ensuring that the analysis is not biased is extremely important. One strategy is to consider all available data to ensure no data of specific incidents is missing in the analysis. Identify the surrounding factors that can affect the data, include self-reported data, use oversampling effectively, and think about fairness from beginning to end.
Comments