I have a fairly trivial situation that I’m struggling to create a segment for in Forest.
Users can book appointment on our platform and I have a “users” table and a “bookings” table.
I would like to create a segment for “users without bookings” or “users with >1 booking with status ‘accepted’ and type ‘consultation’”, but it seems like that may be impossible with Forest.
but this returns Unexpected field type 'OneToMany': 'users.bookings'.
It seems like this usecase should be handled, since it’s trivial to write SQL for this, but I can’t figure it out. Is there some other lower-level way to define a segment?
Note that there are potentially thousands of bookings and thousands of users, so finding all users who have a booking and doing a "user_id" in [list of user ids] is not feasible for us.
Today our numbers are small, but we expect them to grow quickly.
I imagine it will be 50:50 for users with and without any appointments booked, so with 1000 users who’ve booked an appt, another 1000 will be in the other category.
Users with at least one accepted booking of type “consultation” will also be large numbers since we only have 4 appointment types.
I can probably make our segment sizes smaller or at a max limit and Forest still be usable for us.
Is there some limit you recommend?
I’m also thinking of alternative approaches. Maybe I can create cohorts on PostHog and have those pulled into and matched up with our database. I’m curious whether you have any other strategies for this type of thing.
You’ll have some limit both in terms of performance and what Postgresql will accept to do. I know that there is a limit for list of values with the IN operator.
If you’re able to limit these segments for instance by limiting them to the “X most recent users” or whatever other criterion makes sense for your business, it’ll help a lot for sure.
I was trying to experiment with SQL views also, on my side. It could work but with some other limitations:
Using them as intermediate collections with 1-1 relationships & construct the segment with this won’t work because, in the end, the agent will do exactly the same (retrieve the whole list of ids and inject it in a filter).
Using them as collections directly will display them as independent collections, without the customizations you could define on your real users collection.
Ideally, having this data written somewhere in your DB would work perfectly in terms of performance and experience.
I could not find any info about such limit. I based my answer on a past experience with a limitation, but now that I’m thinking deeper, I recall that it was not on Postgresql.