cover-img

Set up an ETL Data Pipeline and Workflow Using Python & Google Cloud Platform (COVID-19 Dashboard)

8 September, 2020

0

0

0

Landing page of the COVID-19 Dashboard

Landing page of the COVID-19 Dashboard
Data Pipeline from Extract to Transform to Load
Having learned & used Python for about a year, I am no expert when it comes to data pipeline and cloud platform in general. This guide is my personal journey on learning new techniques and some things to keep in mind when developing a data solution.
Big Data
Google BigQuery

Google BigQuery

Cloud Storage
Google Cloud Storage

Google Cloud Storage

Languages
Python

Python

Cloud Hosting
Google Cloud Platform

Google Cloud Platform

Query Languages
JSON API

JSON API

Lessons Learned:

  • Try your best to see what kind of data are out there but don’t get hung up on trying to incorporate all of them
  • Process optimization comes with experience so don’t sweat it if later you find out what used to take half an hour can now take 5 minutes
  • Data visualization should be user-friendly and so your back-end data and tables should be revised based on user’s feedback and the interface should be self-explainable
  • Large amount of data can increase loading time (page 2 of the report) so optimization needs to be done
  • Table structures and schema are important for blending data and need to be designed before incorporating into the workflow (with a lot of deleting and recreate tables in the process)

Next Steps:

  • With the understanding of the ETL pipeline, optimize and continue to optimize
  • Incorporate data specific for states such as government measures and business reopening
  • Look into other models other than ARIMA while evaluating strengths and weaknesses
  • Build an ML model that incorporates all data relevant to research

machine learning

etl

cloud computing

google cloud platform

bigquery

0

0

0

machine learning

etl

cloud computing

google cloud platform

bigquery

Ryder Nguyen

San Francisco, CA, USA

Experienced Analyst | MSBA, BS in Actuarial Science | Big Data, Machine Learning, Cloud Computing | Passionate about the power of data and the power of baked goods

More Articles

Showwcase is a professional tech network with over 0 users from over 150 countries. We assist tech professionals in showcasing their unique skills through dedicated profiles and connect them with top global companies for career opportunities.

© Copyright 2024. Showcase Creators Inc. All rights reserved.