I’m working on an Apache Beam pipeline that converts images to TFrecords. So far it’s just converting the images into the format that’s required by the TF object detection API.
Now I’m wondering, how do I incorporate my existing pipeline in a TFX pipeline. I’m aware that TFX uses Beam. But can I just lift my pipeline straight into preprocessing_fn?
That is correct. Mentors are for answering student questions. But also, please understand:
Firstly, you are asking MLEP specialisation related questions from Deep Learning Specialisation mentors. They won’t be able to help you out as they are not familiar with its content.
Secondly, by tagging entire groups, yes, some will answer you, but others will start getting unwanted notifications.
Thirdly, our mentors are very dedicated when it comes to helping learners. You don’t have to tag them to get their attention. They all get number of queries everyday so sometimes they might not be able to answer some of the queries immediately, but they do answer when they find time.
You can visit this page on Coursera, to look for who are the mentors for this course. You can ask them to help you out, but refrain from direct messaging them unless they ask you to.
I agree that your post has been unanswered for 6 days. We have given a nudge to the mentors to answer the pending queries. So hopefully soon either, one of the mentors or Chris will answer you back.
I’d have loved to help you out, but I’m also unfamiliar with this content.
Hi @smedegaard ! Maybe we can resolve this starting from ExampleGen / start of your pipeline. What would be your input there? Do you have a CSV that contains the image paths, bounding boxes, etc… Or do you have TFRecords already that contain the images? Maybe you can upload some code so we can see the problem more clearly.
luckily I can reuse a lot of the code from my Beam pipeline.
An alternative might be to feed all raw images to my Beam pipeline first, and then feed the TFRecords to TFX. But this would create problems if I wanted to deploy the model on a edge device, I guess?
I would be very happy if you can confirm or deny these assumptions.
Hi Anders! Thank you for the info! Please give me some time to go through them. We are currently busy with some tasks on the backend but rest assured that I have this thread bookmarked for review. At first glance though, I think you’re right and the best approach is to create a custom component so you can make the transformations work properly even on edge devices. But let’s see. Will get back to you.
Hi Anders! Sorry if it’s taking a while. We still have our hands full but I think the load would be lighter on Monday and I can look at your problem more closely. I’ve been practicing TFX lately too and hopefully, some of the concepts can also be applied here. Again, sorry for the delay and will catch up next week!
Hi Anders! Just letting you know I didn’t forget about this. Was supposed to look at it earlier but power went down for quite a while. I’ll give it a shot tomorrow. Thanks.
I think for your application, the main thing you’ll need outside of the standard TFX components is the component to load the raw images from disk (i.e. the one done by your get_image_content() function). So you can pack that together with the image dimensions and labels into tf.Examples() then serialize them to TFRecords before feeding to ExampleGen. I think that’s what you’re already doing.
Maybe you can simplify that initial step and just use the raw image bytes and annotations in those TFRecords. All the reshaping, padding, changing datatypes, etc… can be handled in the Transform component (just like the Course 2 W4 ungraded lab with CIFAR10). That way, the transformation graph can be exported with the model and you will just need the raw input during inference. If you’re enrolled in Course 3, you might pick up some tips on the Transform and Trainer components with the TFX-related labs in W1 and W4.