Exports only export a partial subset of the collection

Hello @remi_okarito :wave:

I’ve seen that 3 people tried to reproduce your issue without any success.

I suggest there is some custom code somewhere causing the issue. To help us investigate, It would be really great if we could have access to the code and a dump (with fake data) corresponding to your models.

With this in our hands, it will be way easier to reproduce, assert the bug, and fix it.

If you are interested in such a manoeuvre, please ping me here and we will start the transfer.

All the best,

Steve.

Hi Steve. Ok, happy to help. I’m currently exporting a dump with fake data, bear with me.

3 Likes

Copying here a message I sent to Jeff earlier

Our data model is quite complex but maybe it can help so here we are:

Organization has_many Teams
Team has_many Users
=> Organization has_many users through Teams

On the other side,
Roundtrip has_many Trips
Trips has_many UserBookings
UserBookings has_many Users

And we have: Organization has_many roundtrips through Users.

What I understand is that the export request needs to select n > 1000 lines and groups them afterwards based on Roundtrip ID. But we have a LIMIT 1000 so it can’t select more than 1000 lines and it doesn’t return the whole list of Roundtrips.

When I remove the LIMIT 1000 in the export request, it returns 1096 -> it shows that more records should have been retrieved.

Does that make sense to you ?

Hello Forest team, do you plan on fixing this export issue ?

We used to rely on Forest to make all our data exports and build some of our reportings, but it is now unusable as only a fraction of the records is exported.

Hey @remi_okarito :wave:,

Sadly, we are still unable to reproduce this issue. It is weird though, since you are not the only one having CSV export issue using rails.
To that day, we’ve never had a reproductible setup allowing us to find what is failing, thus we were unable to locate what goes wrong.

I also did a few test for this thread, which seems to declare a similar issue, but with no luck.

AFAIK:

  • The issue is not located in a wrong/bad/incorrect definition of fields in the database
  • The issue is not reproductible with our mongoose/sequelize integration (The frontend request sent looks similar between our rails/js integration)
  • We are not able to reproduce this issue with our live demo (Running the rails agent)
  • The thread I mentioned earlier refers to WHERE condition added in the SQL request when generating the CSV - which does not seems related to our forest_liana gem.

The only place I can think of for this issue to happen would be a conflict between forest_liana and your code, for example, a method override of ActiveRecord shared between these two, that would lead to difference between the export.

If that’s possible, a minimal & reproductible setup would really help.
If not, maybe the SQL request executed for the export (not the count) might show something interesting, like in the mentioned thread.

I’m really sorry for the inconvenience generated by this issue, and looking forward to your answer in hope to find a fix :pray:

We managed to reproduce the issue with @Sliman_Medini if I remember well.

What happens is:

  • Forest makes a SQL count query to know how many records should be exported.
  • This SQL request returns a number (n) of records
  • Forest makes (n / 1000 + 1) SQL export queries (each with LIMIT = 1000)

But both queries don’t query exactly the same set of elements (as noticed by Arnaud here Exports only export a partial subset of the collection - #14 by Arnaud_Moncel)

If the first query returns 1200 (unique records), then 2 export queries will be run (1200 / 1000 + 1).

But in some cases, the export queries are slightly different than the count query and they require more records to be retrieved. If the second query needs more than 2000 elements to be retrieved, then the export will be partial, because only the first 2000 elements will be retrieved.

The only way to fix this is to understand why the COUNT query and the EXPORT query don’t execute the exact same request.

In order to reproduce, you will need an example where the COUNT query returns n records, and the EXPORT query returns m records, with m > (n / 1000) + 1.

Let me know if this in unclear.

Remi