Is BigQuery Omni the next revolution in Data Warehousing?

IDC predicts that global dataspace will grow up to 175 zettabytes (10 to power 21) by 2025. Even my driving license is connected to the internet using RFID, and therefore continuously generates data.

 

According to a cloud adoption survey by Gartner, 81% of companies that are using public clouds are using more than one cloud provider. In simple terms, multi-cloud management is becoming more important than ever. However, the separated data needs to be implemented in a centralized data warehouse, because it’s impractical to have multiple data warehouses inside a single company. To address this, last September in GoogleCloudNext 2020, Emily Rapp, product manager of Google BigQuery, announced the next state-of-the-art data warehousing solution “BigQuery Omni”, especially for people who are using multi-cloud vendors.

 

BigQuery is not a new term for us. Google introduced BigQuery in late 2011 to handle massive amounts of data, such as log data of thousands of retail systems, or IoT data from millions of IoT devices across the globe. It’s a fully-managed and serverless data warehouse which shifts focus to analytics instead of managing infrastructure.

 

Breaking the Silos

 

BigQuery is designed to manage data silo problems that happen when a company has individual teams, each with their own independent data marts. By integrating BigQuery with the Google Cloud Platform, a company can easily handle the data version control problems mentioned above. But, with increasing demand for multiple cloud vendors to be used inside a single company, BigQuery Omni came into the picture. This became reality because of ‘Anthos’, another new technology introduced by Google which enables users to run applications not just using Google Cloud, but also with other cloud vendors, such as Amazon Web Services (AWS) and Microsoft Azure.

 


 Source: Google Cloud blog

 

As for the data silo question, the main challenge was that there was no method to compute data held on another cloud platform. But with BigQuery Omni, we can run our compute clusters (known as Dremel) on Anthos clusters in AWS or Azure. It has a secure connection because the control plane and the metadata can remain on Google Cloud and only the query results pass through the BigQuery routers. That connection can also be used when users choose to bring the results back. Users can decide either to bring them back, or to do everything within AWS.

 

The competitive landscape

 

So far, BigQuery has two major competitors: AWS Redshift and Snowflake. As of 18th May 2020, the BigQuery market share is less than AWS Redshift, but its growth rate is quite impressive.

 

But, with this new release of BigQuery Omni, the serverless data warehousing technique is going to blow up the current market. Currently, BigQuery Omni has no competition, and neither Redshift nor Snowflake supports multi-cloud vendor integration so far. So, this really is going to be a wake-up call for BigQuery competitors.

 

Sources for their individual adoption: RedShift, BigQuery, Snowflake

 

Practical application

 

So, what does BigQuery Omni ‘looks like’ in action? Here’s one example – have you ever had the experience where you buy something, and you still see the ad repeatedly? You have already bought it, right? So, there’s no point in wasting ads on the existing customer. Using BigQuery Omni, we can solve the issue, because we can tie that commerce data to the ad platform safely and securely to ensure that once a purchase has been made, the ad no longer appears.

 

Pros and cons

 

As with everything, there are advantages of using BigQuery/BigQuery Omni.

On the plus side you get:

  • Low-level access to BigQuery Omni users;
  • Simplicity – because of its truly serverless architecture, there is very little that you have to do to manage your BigQuery setup. (You basically just run your queries and pay according to what you scan);
  • Scalability – you can scale up to 100TB queries very easily without scaling any infrastructure or anything else;
  • Breaking down of silos and gain insights into data;
  • A consistent data experience across the clouds – it doesn’t matter where your data sets are, you should be able to use standard SQL to write your queries in the BigQuery interface; and
  • Portability, powered by Anthos.

On the minus side, there’s:

  • A relatively high pricing structure – to use BigQuery Omni, you have to use Anthos at the same time, so you pay for both services; and
  • Google BigQuery requires knowledge of SQL coding to leverage its data analysis capabilities.

Summary

Companies using public clouds from more than one cloud provider need a centralized data warehouse to hold their data. BigQuery Omni is making a splash in the market by providing secure, serverless data warehousing, along with a host of other benefits.

 

Happy coding!