Welcome to
My Portfolio

A Showcase of My Projects and My Abilities

Heimdall
The Gatekeeper

This API gateway built to maintain data manipulation without directly accessing the database to avoid potentially harming the database performance. The manipulation means to process the key-value json formatted into column-value to be able to operate it into database

ETL Cloud
Batch Processing

Tech companies used to require having their own on-premise servers to operate ETLs on their data. Of course, numerous companies would have to face issues on scalability, data loss, hardware failure, etc. With the appearance of cloud services offered by major tech companies, this was changed, as they provided shared-computing resources on the cloud which is able to solve most issues found on on-premise servers. This repo is to set up a Google Cloud Composer environment and solve several batch-processing cases by creating DAGs to run ETL jobs in the cloud. The data processing consists of ETLs with data going in and from GCS and BigQuery.

Sparkjob on
Google Cloud Dataproc

The problem in this project is “How can we process a huge amount of data automatically without writing a script repeatedly?” With a huge amount of data from local computer, we should transform the data and store them into BigQuery as the Data Warehouse

HR Analytics
Classification

Currently, the processing & analysis the employees data has been largely manual to filter the eligibility of employees for promotion, and this leads to delay in transition to the new roles after promotion. With a large time and manual process, processing & analysis are also may be less accurate. This project is about developing Machine Learning model to help HR team to filter the eligibility of employees for promotion process.

LinkedIn Job
Scraper

Data can come from everywhere. In this project, I create script to do data scraping about information of Jobs posted from LinkedIn automatically. The automation build with Selenium and Python. Also I make a simple Exploratory Data Analysis to find the insights from the scraped data.

Customer Segmentation
with RFM

The data I used for this project provide customer and date of transactions for few years. From this, I do customer segmentation by using RFM analysis. Along this project, I used the common 2 clustering algorithm: Agglomerative and K-Means. At the end, the result is customers can be segmented into 3 types: Gold, Silver, and Bronze

ETL Pipeline
with Luigi

In tech companies, data comes in many forms: json, txt, different databases, google spreadsheet and many more. But in the end, it will be stored the data in a single location as the single source of truth. In this project I applied the suitable transformations for each type of data, and store the information into a local Data Warehouse.

Fetching Data from
Large JSON File

When dealing with large JSON file, it is common that the JSON may be as a normal JSON or the Nested JSON. This project is developed to extract and clean data from JSON files with huge JSON files. The function built in this project allows us to fetch the data from JSON file with some specific fields requirements as needed.