Thinking about becoming a data analyst in 2024?
Career in Data Science
My road to data analyst career was windy and full of U-turns. However, I consider them as positive stops, not breakdowns or something negative. I graduated from a psychology department in 2015, with a specialization in forensic psychology. In the same year, I started my Ph.D. in psychology. As a Ph.D. candidate, I participated in many research projects and conducted my own research. It quickly turned out that research requires data analysis and there are not many people around to do it for you. This is how I started learning data analysis. First, I used a user-friendly software like SPSS and watched hours of YouTube tutorials on how to perform descriptive statistics, compare means, visualize data and so on. The first thing that concerned me for a long time was if it was better to start computer science study or something more related to data analysis, than going through years of psychology and learning things that I might never need. To my surprise, the answer is NO.
Domain knowledge in data analysis
Before I really knew what data science was, I was pretty sure it was all about math, statistics, technical skills, programming, and so forth. It took me some time to understand that skills and programming abilities are just one side of this coin and there is another side with skills that some people might not relate with data analysis. From a few years of experience, I know now that understanding data, ability to describe it, being creative and thinking outside the box is more important than typing commands and building codes itself. Understanding the process of gathering data, questions that can be addressed with content you have, interpreting outcome, and explaining it to someone who is not versed in data analysis are 80% of my work. The work which many people unify with doing math with a little more complex calculator. Domain knowledge and general experience in scientific or business background give you a unique perspective and ability to understand data, not only ability to manipulate or modeling.
The most necessary skills
Many people asked me what the best starting point is for someone who wants to work as a data analyst. I would say the best way to start is to learn basic math, not getting too much into details because you are not trying to get a Fields Medal, but if you understand probability, normal distribution, basic regression, and central limit theorem, you will very likely understand most of complicated statistical issues.
Contrary to what many people think, working as a data analyst involves constant communication with people and teamwork. So, if you are an introvert and hope that data analysis is perfect for you, you might be seriously disappointed. It does not really matter what kind of analysis you do, team calls, meetings with stakeholders, or discussions with other teams will be your daily routine. Consequently, you need to be ready to clearly communicate your ideas and thoughts to people who come from different backgrounds, have different experiences, and what is the most important, those who may not understand statistics and data analysis. I am a socially awkward introvert, so when I started my career, I kept my interaction with people to a minimum and hoped that my short messages and one-word communications would be clear to everyone else. Quickly I realized that no one could read my mind and, moreover, no one had time to think what I wanted to say. What they needed was a clear and concise message or short and direct question.
Further, sooner or later (and I bet sooner), you will need to prepare a presentation or will have to talk to someone who will not be interested in your sophisticated models or the most complex statistical methods. Instead, they will want you to answer a concrete question i.e. is this drug efficient enough, does this ad increase sales, or how important it is to organize pride party for employees. Because of that you need to translate language of data to language of normal human (I believe we all agree that data analysts are a little cuckoo). And trust me, sometimes I have to explain things that I was sureeveryone understood, so be patient, try to describe the same thing in hundred different ways, and finally you will find the best way to explain what you do in an easy and comprehensible way.
Coding and programming skills
Using programming languages is part and parcel of data analyst role. However, it does not mean that you have to be anadvanced user of Python, R, JAVA, or SAS from the very beginning. Ability to build complex queries comes with time, the most important thing is to start.
If you are making your first steps in data analysis, I recommend starting with the easiest and most basic software, such as MS Excel. Try to manipulate with simple data, subtract, add, divide, aggregate data, build tables, and charts. This might sound too simple for someone who inspires to be a data analyst, but trust me the simple things are crucial for your further development. Moreover, excel is still widely used in data science world, so neglecting it might backfire someday.
Another useful tool is SQL (Structured query language). SQL is quite simple, user-friendly and gives you a nice overview of how you can retrieve information from database. In addition, SQL is very intuitive, and for someone who speaks basic English, it is very easy to understand queries such as HAVE, LIKE, INSERT, DELETE, SELECT, and so on. Since 80% of data analyst job is data manipulation if you get the hang of SQL, you will be able to transfer this ability to almost any programming language (Pandas, dplyr, tidyr, etc.).
When you feel proficient enough in SQL, you can take your first steps in more advanced programming languages like Python or R. Honestly, both are similar in their structures; Python is more common in business data analysis, R is widely used in healthcare environment. However, you can start with the one that seems more friendly to you, and soon you will notice that it takes only a few days to switch to the other one. Books (or actually workbooks!) I can recommend are O’Reilly series Head First Python and Head First R. They are written in a very comprehensive way and help you to build fundamentals for any language you need. Once you have the basics of R or Python, you can start building your own projects with real-life datasets. If you do not have your own data, you can use databases which can be downloaded for free from kaggle.com or Google dataset search. Start with small database which includes only a few variables and step by step you will be able to work with big data.
Finally, I suggest that you be very patient. Before each success in data analysis, there are hundreds of mistakes, and do not treat them as something bad. Mistakes are necessary for you to learn and to figure out what should be done to make it work. And the most important thing, share your work. Even if you think it is nothing special or nothing to be proud of, this is something you put your time and effort into, and while some people can use it as an inspiration, other people might give you hints to lift your project up. So, feel welcome to the data science family and good luck!