I got an Instagram DM the other day that really got me thinking. This person explained that they were a data analyst by trade, and had years of experience with data concepts. But, they also said that they felt that their technical skills were slightly lacking and they wanted to transition to being more technical.
As I thought about it more, I realized this person was in the perfect position to make their desired transition. Why? They had already mastered the data concepts and data mindset that is crucial to being successful in the field of data.
I (and so many others) worry about mastering every technical tool or product that is out there. We mistakenly worry about only having experience with Microsoft products (SQL Server, Excel, Power BI), and feel that I need to broaden my horizons to be a better data analyst. We constantly question and debate online about whether Python or R is better.
Data Skills and Concepts For The Win
Speaking with my new Instagram friend helped me realize that these worries and debates are quite silly. Tools and programming languages are constantly evolving and changing, coming and going. But you know what is here to stay? The core concepts. Every tool or language that is ever built will always fall back on these core concepts.
If you understand how to take a data set, manipulate it, and present it in a way that provides genuine insight (or at least invites more questions that you didn’t have before… because that happens!!), you are on the right path to succeed as some sort of data professional.
This base understanding of data is so powerful. You can take this understanding, and combine it with any technical tool of your choice. Then, you can group and filter data for business reporting and KPI monitoring, conduct statistical tests to answer questions about data, predict future data, or even generate AI models to use data to help guide business action. And you can do all these things with huge data sets containing millions and millions of rows!
OK I know I’m selling you and selling you on this idea, so let me cut to the chase. If you understand data concepts and how to apply them, you can easily implement these concepts with any technical tool or product of your choice.
But don’t worry, I’m not just here to sell you on this and then head out. I’m going to talk about 3 basic data skills that I use daily as a data analyst, from a general perspective. NO TECHNICAL TERMS OR CODE INVOLVED. If you begin to master these (and other) data concepts, it is EASY PEASY LEMON SQUEEZY to take them and apply them with any tool. I even have a serious life hack at the end of the article that will help you further flex your new data knowledge in any tool you’ve been wanting to master. Stick with me, I got you!
Data Concept #1: Filtering Data
The first data concept that is crucial in the data world is filtering data. Honestly, filtering data is a super simple concept and one that we as human beings do on a daily basis. Take this example. If you are going to get McDonald’s, you should probably ask your 3 roomies if they want some (because you don’t wanna be that roommate). But, before you go ask your roomies if they want chicken nugs, you remember that 2 out of your 3 roomies don’t even like McDonald’s. So, you only end up asking one. Basically, you just conducted your own data operation. You “filtered out” your two roommates from your “data set” based on some “attribute”, which is whether or not they like McDonald’s.
Filtering data as a data analyst or data scientist works the exact same way. If you are conducting an analysis on female customers, you’d use whatever tool you have at your disposal to filter out the non-female customers. If you are building a model that recommends skincare for adults, you would filter out any data for non-adult patients.
Long story short, filtering data is stripping away all of the undesired data until you are only left with the data you need for your analysis.
Data Concept #2: Data Type Conversion
Another commonly used data skill is data type conversion. Data types are certain categories that data can fall into when it is stored in a spreadsheet, software, or database. Some common examples of data types are:
- Strings (“Hello, this is a string.”)
- Integers (400)
- Decimals (400.17)
- Booleans (TRUE)
When we are working with a data set, we should ensure that each data attribute is stored as the correct data type.
We would not want to store the integer 123 as a string. If we store 123 as a string, the spreadsheet, software, or database may not be able to perform necessary operations on it. The computer would get confused. If we tell the computer that we have a string (“123”), but later we want to add that “123” to something, the computer is going to say “HOLD UP A SECOND. You taught me that “123” was a STRING, which is basically a word. Ya can’t add words crazy person! You can only add numbers!!!!”
Sorry the hypothetical computer got so aggressive there, but you get the point. To ensure that we can perform proper operations on our data down the road, we must make sure that it is represented as the right type.
Data Concept #3: Aggregating Data
The final data concept that I want to touch on *for now* is aggregating data. Aggregating data is so so so SO powerful. Aggregating data can take you from a big giant text file of rows and columns of data, and turn it into a summary value or a summary table that is much more meaningful and pleasing to the eye.
Notice how I kept saying the word summary up there? It’s probably the best way to explain an aggregation. Aggregations take multiple rows of data and summarize them into a smaller number of rows.
If you have data with numbers that can be added (such as quantities or sales), one of the simplest ways to aggregate that data is to sum it up. In the example below, I took a data set that contained the amount of coffees I drank each day. I applied an aggregation to it by summing it, which created a summary view of my data on the right. This summary shows that I drank a total of 4 coffees (in this data set at least).
There are many other aggregate operations that are pretty intuitive, even for those that are new to the data world. Each of these operations answers some question that informs us more about our data set. Some examples of other simple aggregate operations are:
- Count (how many records are there?)
- Maximum (what’s the biggest observation?)
- Minimum (what’s the smallest observation?)
- Average (what do I tend to observe?)
OK coooOooOol.. so what’s next?
I know I promised you a life hack earlier, so don’t worry – I didn’t forget. Since you have a firmer grasp on some of the most crucial steps in a data professional’s workflow, you can take them and apply them with any technical tool of your choice – even if you are a newbie. How? With our best friend, our ultimate savior, GOOGLE!
Whenever I want to practice a skill with some tool, and I need a refresher on how to execute it properly, I will Google in this format:
[insert data skill] in [insert technical tool]
I swear to you, any time I Google in this format, I always end up finding great documentation, blog posts, or other resources (such as Stack Overflow) that direct my thoughts toward the solution.
So, did you find aggregating data interesting? And are you wanting to better your SQL skills? Then I would recommend reviewing and working on:
aggregating data in SQL
Are you basically a pro at filtering data in Python, but now you would like to try it out in R? Try my life hack and Google:
filtering data in R
Take it from the girl who overwhelmed herself for months before pursuing her data career dreams. Learn the concepts first. Worry about the tech to get it done later. Technology is always evolving, but the foundations aren’t.