Agent unreachable. Forest Admin can't reach your server

Alexandre_Jousse · September 18, 2024, 12:39pm

Feature(s) impacted

We are often seeing this behaviour were we are unable to access the admin dashboard whereas the database is up and running and accepting connections.

Observed behavior

We are unable to access the project. It’s very random and impacts our daily use of forest admin

Expected behavior

We access the project without any issue

Failure Logs

Please find all evidence of our issue in this jam (all browser logs)
https://jam.dev/c/4d06f388-e148-4e74-9b50-390e0ca775d3

Context

Project name: Yalink
Team name: Operations
Environment name: production
Agent technology: cloud
Agent (forest package) name & version: …
Database type: postgresql
Recent changes made on your end if any: …

Alban_Bertolini · September 18, 2024, 2:35pm

Hello,
I’m very sorry about these troubles. Many thanks for the failure logs.
We are investigating right now.

I will come back very soon

Alban_Bertolini · September 18, 2024, 2:52pm

We have found 2 errors on our logs but some errors are probably from our parts.

The first one:

Cannot read properties of null (reading 'freelance')","stack":"TypeError: Cannot read properties of null (reading 'freelance')
    at /opt/nodejs/customization/index.js:1:6404
    at Array.map (<anonymous>)
    at Object.getValues (/opt/nodejs/customization/index.js:1:6380)
    at /opt/nodejs/index.js:7:69141
    at transformUniqueValues (/opt/nodejs/index.js:7:67844)
    at computeField (/opt/nodejs/index.js:7:69070)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Promise.all (index 20)
    at async computeFromRecords (/opt/nodejs/index.js:7:69656)
    at async r.list (/opt/nodejs/index.js:7:89418)"}}

The second:


{"level":"Warn","event":"request","status":404,"method":"GET","path":"/deal/count","duration":18,"error":{"message":"operator does not exist: vector ~~* unknown","stack":"Error
    at EV.run (/opt/nodejs/index.js:352:117623)
    at /opt/nodejs/index.js:352:235667
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async LXe.select (/opt/nodejs/index.js:338:76993)
    at async deal.findAll (/opt/nodejs/index.js:338:24944)
    at async PQe.aggregate (/opt/nodejs/index.js:329:257352)
    at async u3e.aggregate (/opt/nodejs/index.js:7:80653)
    at async r.aggregate (/opt/nodejs/index.js:7:60394)
    at async N3e.aggregate (/opt/nodejs/index.js:7:95603)
    at async sve.handleCount (/opt/nodejs/index.js:90:41805)"}}

We continue to investigate…

Alban_Bertolini · September 18, 2024, 3:02pm

We have found some requests in timeout because the cloud solution timeout after 30s for performance reasons. In your case, for example when you are loading the software skills from the table freelance the count request takes more than 30 seconds.(/…/forest/freelance/anId/relationships/software_skills/count)

Then,
we have also noticed a number of problems with the database connection timeout. Could you please check the database usage on your side to make sure it’s not overloaded

Alexandre_Jousse · September 18, 2024, 4:16pm

Having a quick look at my postgres stats today, I can say that it’s really not doing much…

Is it possible to exclude from the topology known by forestadmin some tables that we have not implemented in the admin ?

Alexandre_Jousse · September 18, 2024, 4:20pm

One thing to note is that we have other tools connected to our DB (like metabase) and we super rarely have database timeouts. The availability of metabase is a lot higher than what we can see in forestadmin.

Checking the exact same query (from what I can understand, I get instant responses in metabase) but forestadmin keeps crashing

With random errors every time… Auth failed, CORS issue, etc

Alban_Bertolini · September 19, 2024, 7:05am

Could you check the number of connections on your database when you encounter a problem? Is it possible to check the maximum number of connections your database allows ?

Alexandre_Jousse · September 20, 2024, 7:21am

The pool size for internet is 10 ! It is something that we cannot change unfortunately…

How many connections do you usually require ?

Alban_Bertolini · September 20, 2024, 7:50am

It’s probably the issue. Internally, we are using AWS lambda to host your agent, each lambda open a new connection. If your agent is used a lot, we can have more than 10 connections opened. By monitoring your agent, we are using probably more than 6 connections on your database.

It is something that we cannot change unfortunately…

Could you tell me more ?

Alexandre_Jousse · September 20, 2024, 8:05am

Do you have a max number of lambdas running in // ?
Does it mean that one frontend connection triggers one lambda execution ?

We are using a backend as a service provider (nhost) which handles the operations of our database and they are surprised that 10 isn’t enough. They already have a good number of customer and I kind of trust them in that regard

[edit] And the issue is back with almost no admin users using the admin tool

Alban_Bertolini · September 20, 2024, 8:12am

On our side, we can limit the lambda concurrence to avoid to open too many connections. But it can impact your request response time when you are using forest.

Do you have a max number of lambdas running in // ?

Currently for your project the answer is No, but we can do it. We can do some tests together to find the best limit appropriate to your usage.

Does it mean that one frontend connection triggers one lambda execution ?

It’s more complex: the AWS Lambda solution chooses when to launch a new lambda to process your http request from your frontend. In most cases, it reuses an already open lambda.

Alexandre_Jousse · September 20, 2024, 8:16am

Hmm ok interesting,

One other thing I could look at is using a postgre pooling proxy in the middle to avoid this issue (cf. https://www.pgbouncer.org/)

Alban_Bertolini · September 20, 2024, 8:20am

Yes, you can ! It could be a great solution. We are using it internally

Alexandre_Jousse · September 20, 2024, 9:19am

I am not able to switch the credentials to the pgbouncer instance , it seems to be timing out … do you have any pointers ?

I was able to validate the setup with metabase !

[edit] in fact the nhost team told me they already use pgbouncer as a proxy to the database. So I don’t think the number of connections is the issue

[edit again] I have some stats on their side of pgbouncer and it seems they can see the waiting connections spike

Yellow is waiting connections and orange is active ones

Could you limit the number of lambdas exec in // to something like 2-4 ?

Alban_Bertolini · September 23, 2024, 8:15am

Hello,
Our product team has probably contacted you to discuss the problem.
Do you have any news?
Alban

Alexandre_Jousse · September 23, 2024, 9:38am

Not yet I think ! I am then waiting for their input … I have mitigated the issue for now with a combination of having two layers of pg bouncer, to have forest admin be the top prio in terms of number of connections. And I have increased to 20 the pool size from internet

Alban_Bertolini · September 23, 2024, 10:17am

Can you tell me in a little while whether this modification has solved the problem?

Alexandre_Jousse · September 23, 2024, 2:04pm

I keep having the same errors even with drastic limitations on my other apps. This is so frustrating

I can’t see this error anymore it’s too much

I wanna cry haha

Alban_Bertolini · September 23, 2024, 2:19pm

I’m really sorry about this error…
Do you want to try to limit the number of lambdas exec in //?

Topic		Replies	Views
Agent Unreachable error Help me! howtos	1	218	June 1, 2023
Agent Unreachable problem Help me!	5	465	May 9, 2023
Your admin backend is unreachable. Please check that it is running and respond to http://forest.***.com Help me!	4	1168	June 30, 2020
"Forest Admin can't reach your agent; please ensure that it's running." Help me!	11	28	March 27, 2025
Deployment of Admin Backend to Fargate/ECS Help me! howtos , setup	5	227	September 19, 2023