How I prepped for GCP Professional Data Engineer Certification from Zero

Anna (Myeongjin) Choi
6 min readAug 2, 2022

--

Okay. “From zero” might be a bit exaggerated. Here’s where I actually started from:

  • ~5 years of software engineering background
  • software engineering experience with some Azure and AWS products. Some experience with Infrastructure as Code.
  • No prior experience with GCP or data engineering.

When I transitioned into data engineering role on GCP stack in Nov 2021, and I set a goal for myself that I will be taking a GCP certification exam within a year.

I have spent:

~3 months to go over the GCP fundamentals and getting hands-on experience at work, another 2–3 months of going through the exams and digging deeper into the concepts.

overall probably spending 3–4 hours a week on my spare time (outside work).

Materials

I have had a fair amount of hands-on experience with some of the GCP products at work, like BigQuery, Cloud Build, Cloud Composer, IAM etc. But there are some products and concepts that I haven’t touched, so I needed to learn those from scratch as part of the exam preparation.

Coursera (free for my corporate account)

This consists of 6 different courses covering from the GCP fundamentals to the exam prep tips. This was available for free through my corporate Coursera account, and it provided a good starting point.

Practice exams ($39.99 USD for unlimited period access)

There seems to be a couple of different options for the practice questions (whizlabs is another one that provides this) but below is what I used:

Beware — a lot of the given answers are WRONG! 😛

I had to google the questions for the right answers to cross check, and Examtopics website had a lot of questions in common with a community vote on the right answer and people commenting on why.

Although it was quite annoying having to dig out the right answers by myself for a paid materials, I found it pretty helpful for my prep for these reasons:

  • The simulator is basically how it looks like on the actual exam. It has the timer and the “Mark” functionality for marking the questions that you’re unsure of to revisit later. Just getting familiar on the actual exam experience helps I think.
  • Going through the Coursera materials was a high level introduction, rather than a deep dive into the topics. Practice questions helped me realise what I need to read more about, and I did my homework every time a concept from the question is not clear to me or there is a keyword from the question / multiple choices that I don’t recognise.
  • Some of the questions actually appeared on my exam! About 5% were exactly the same including the wording, and about half of the questions I at least recognise the similar pattern.

I paid $39.99 USD for an unlimited access option, and I don’t paying for this instead of just 3 months access.. it really helped me getting back on track after a couple of months of not looking at the exam stuff at all.

Machine learning crash course (free)

I went through this machine learning course after realising I need to understand the concepts and the process of machine learning after some practice questions.

This course is succinct and not too much in depth, with some labs and cool UIs to experiment with. I enjoyed the course and found it helpful.

Some good reads I found online

Exam topics summary

This one provides a really good summary of the topics with the links to more detailed pages for each topic. Highly recommend reading, make sure you double check the google documentation for up-to-date information.

Apache Beam Programming Guide

I haven’t worked with dataflow or apache beam before, so this guide helped me understand the important concepts — transforms, side input/output, windowing, triggering, etc.

Useful Google official documentations on best practices and troubleshooting guides

For this one, go through the “Java patterns” on the left menu bar. A couple of exam questions were asking about these patterns.

Medium blog posts

There should be a lot of posts from people if you google “medium gcp professional data engineer certification”. Look for the most recent ones to see what the scope is like! Keep in mind that the questions and answers might be different from the same exam 2 years ago.

I read someone’s blog post about recent experience taking the exam in 2022, and there were some notes about the questions that he found quite surprising to appear on the exam, or the questions that he wasn’t so sure of. This made me do some last-minute research into these topics which eventually turned up in my exam as well.

Remote vs. onsite proctored exam

I considered trying out the remote proctored exam, however I simply didn’t want to be bothered with the technical difficulties, or cleaning up the room for the exam environment. Also, I heard the software doesn’t work on Mac and Mac users had to use a VM..

Luckily the test centre was just 10 minutes walk from my office location, so I opted to just go in for the onsite test.

My Exam Experience

Having gone through a lot of practice questions multiple times (and being a fast reader), it didn’t take long for me to submit the exam with a reasonable level of confidence.

First scan of all the questions took ~30 minutes: carefully reading the questions, choosing the answers for the confident ones, and selecting the best candidate for not-so-sure questions as well as marking them for later review. I had about 20 out of 50 questions marked at this stage.

In my second scan (15–20 mins), I revisited every question to make sure that I read the questions correctly and didn’t make any mistakes in choosing the right answers. As for the marked ones, I took time reading the questions and the choice multiple times to narrow down the options.

And then I was left with about 5 questions still marked. I repeated the process of elimination, and chose the best answer possible — no question unanswered. I mean, after eliminating 2 out of 4 options, the chance is 50/50 anyway.. 🤞

All of the above steps took me an hour, and I still had an hour that I could use. But would it help me to get a better score?

Google doesn’t have an official passing score, but assuming it’s 70–80%, I had a good confidence that I have answered at least 80% correctly. I concluded that forcing myself to spend an additional hour to look at the same thing had more chance of messing up my head than getting a substantial improvement in my score. So I held my breath and submitted the exam.

I expected the pass / fail screen when clicking on the submit button (and the popup to confirm), but it instead presented me with survey questions about the exam. I had to click through a few more buttons… to get to the page where it says “result: PASS”. YAY 🙌

You will be ready when you:

  • Are familiar with the best practices for each GCP product
  • Are aware of the quotas and limits for different options
  • know how encryption, security, networking, access control are done in GCP

Are you looking to sit the Professional Data Engineer Certification exam?

See this link to read my thoughts on whether the certifications are worth the effort,

and if you decide to lock in the dates (or already have booked the test), I wish you the very best!

--

--

Anna (Myeongjin) Choi
Anna (Myeongjin) Choi

Written by Anna (Myeongjin) Choi

Data Engineer with a passion for engineering principles and best practices

No responses yet