

“Hypothesis” is a generic guess or a scientifically intelligent guess which we make every day. In our everyday lives, we set and think of multiple hypotheses. Some common examples are “It has been raining for the past several days, it might rain today as well. I should carry an Umbrella to work” or “Our class teacher told that this topic is crucial from exam perspective, she might give us a surprise test next week.” Although such statements are generic, they influence decisions that humans make every day. Nevertheless saying, a hypothesis backs almost every decision taken in businesses.
We also come across specific hypotheses in data science projects, which we further validate using several statistical techniques over stipulated data. This hypothesis is to be set up by the clients or the stakeholders’ team before it reaches data scientists. Once it goes to the group of data scientists, they perform statistical analysis to validate the statements.If I have to put it simply, there are two phases under this process. The first phase of the process is "Hypothesis generation," which has to be done by the clients or business stakeholders. This phase is usually carried out by the "analysts" or "Subject matter experts" in a specific domain.The second phase is the "Hypothesis testing" carried out by the team of data scientists. Data scientists spend a substantial amount of time understanding the data before performing features and model engineering.One can use EDA techniques to validate if the business's hypothesis makes sense or if the data has substantial gaps. This article will focus on the first phase of the process, i.e., "Hypothesis generation."

Setting up a hypothesis in data science projects has several positive implications, especially when you have many variables. Few of them are:

Let's take an example to understand how businesses should ideally frame the hypothesis. A company named "heels on wheels" is a taxi service operating in Mexico. The company is trying to understand the time taken by the taxi for each trip. The organization's key members met last Thursday for a quick brainstorm session. Linda prepared keynotes of the meeting, which gives pointers about all the hypotheses discussed in the discussion. A few of the leads are below:

This article discussed the importance of setting up the right hypothesis to reach the solution more effectively. We also discussed the difference between hypothesis generation and its testing. With the help of a case study, we saw how identifying the right data set is of utmost importance. Finally, we bridged the topic with our real-time findings.
*****
About the author: I am currently working as a data scientist with around 2-3 years of experience in analytics. Alongside my job, I own a blog and also write for business websites. I like to read books, listen to some good music, explore places to travel, and, most importantly, "dream" in my free time. One can connect with me: here