Feature(s) impacted
We are often seeing this behaviour were we are unable to access the admin dashboard whereas the database is up and running and accepting connections.
Observed behavior
We are unable to access the project. It’s very random and impacts our daily use of forest admin
Expected behavior
We access the project without any issue
Failure Logs
Please find all evidence of our issue in this jam (all browser logs)
https://jam.dev/c/4d06f388-e148-4e74-9b50-390e0ca775d3
Context
- Project name: Yalink
- Team name: Operations
- Environment name: production
- Agent technology: cloud
- Agent (forest package) name & version: …
- Database type: postgresql
- Recent changes made on your end if any: …
Hello,
I’m very sorry about these troubles. Many thanks for the failure logs.
We are investigating right now.
I will come back very soon
1 Like
We have found 2 errors on our logs but some errors are probably from our parts.
The first one:
Cannot read properties of null (reading 'freelance')","stack":"TypeError: Cannot read properties of null (reading 'freelance')
at /opt/nodejs/customization/index.js:1:6404
at Array.map (<anonymous>)
at Object.getValues (/opt/nodejs/customization/index.js:1:6380)
at /opt/nodejs/index.js:7:69141
at transformUniqueValues (/opt/nodejs/index.js:7:67844)
at computeField (/opt/nodejs/index.js:7:69070)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Promise.all (index 20)
at async computeFromRecords (/opt/nodejs/index.js:7:69656)
at async r.list (/opt/nodejs/index.js:7:89418)"}}
The second:
{"level":"Warn","event":"request","status":404,"method":"GET","path":"/deal/count","duration":18,"error":{"message":"operator does not exist: vector ~~* unknown","stack":"Error
at EV.run (/opt/nodejs/index.js:352:117623)
at /opt/nodejs/index.js:352:235667
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async LXe.select (/opt/nodejs/index.js:338:76993)
at async deal.findAll (/opt/nodejs/index.js:338:24944)
at async PQe.aggregate (/opt/nodejs/index.js:329:257352)
at async u3e.aggregate (/opt/nodejs/index.js:7:80653)
at async r.aggregate (/opt/nodejs/index.js:7:60394)
at async N3e.aggregate (/opt/nodejs/index.js:7:95603)
at async sve.handleCount (/opt/nodejs/index.js:90:41805)"}}
We continue to investigate…
We have found some requests in timeout because the cloud solution timeout after 30s for performance reasons. In your case, for example when you are loading the software skills
from the table freelance
the count
request takes more than 30 seconds.(/…/forest/freelance/anId/relationships/software_skills/count)
Then,
we have also noticed a number of problems with the database connection timeout. Could you please check the database usage on your side to make sure it’s not overloaded
Having a quick look at my postgres stats today, I can say that it’s really not doing much…
Is it possible to exclude from the topology known by forestadmin some tables that we have not implemented in the admin ?
One thing to note is that we have other tools connected to our DB (like metabase) and we super rarely have database timeouts. The availability of metabase is a lot higher than what we can see in forestadmin.
Checking the exact same query (from what I can understand, I get instant responses in metabase) but forestadmin keeps crashing
With random errors every time… Auth failed, CORS issue, etc
Could you check the number of connections on your database when you encounter a problem? Is it possible to check the maximum number of connections your database allows ?
The pool size for internet is 10 ! It is something that we cannot change unfortunately…
How many connections do you usually require ?
It’s probably the issue. Internally, we are using AWS lambda to host your agent, each lambda open a new connection. If your agent is used a lot, we can have more than 10 connections opened. By monitoring your agent, we are using probably more than 6 connections on your database.
It is something that we cannot change unfortunately…
Could you tell me more ?
Do you have a max number of lambdas running in // ?
Does it mean that one frontend connection triggers one lambda execution ?
We are using a backend as a service provider (nhost) which handles the operations of our database and they are surprised that 10 isn’t enough. They already have a good number of customer and I kind of trust them in that regard
[edit] And the issue is back with almost no admin users using the admin tool
On our side, we can limit the lambda concurrence to avoid to open too many connections. But it can impact your request response time when you are using forest.
Do you have a max number of lambdas running in // ?
Currently for your project the answer is No, but we can do it. We can do some tests together to find the best limit appropriate to your usage.
Does it mean that one frontend connection triggers one lambda execution ?
It’s more complex: the AWS Lambda solution chooses when to launch a new lambda to process your http request from your frontend. In most cases, it reuses an already open lambda.
Hmm ok interesting,
One other thing I could look at is using a postgre pooling proxy in the middle to avoid this issue (cf. https://www.pgbouncer.org/)
Yes, you can ! It could be a great solution. We are using it internally
I am not able to switch the credentials to the pgbouncer instance , it seems to be timing out … do you have any pointers ?
I was able to validate the setup with metabase !
[edit] in fact the nhost team told me they already use pgbouncer as a proxy to the database. So I don’t think the number of connections is the issue
[edit again] I have some stats on their side of pgbouncer and it seems they can see the waiting connections spike
Yellow is waiting connections and orange is active ones
Could you limit the number of lambdas exec in // to something like 2-4 ?
1 Like
Hello,
Our product team has probably contacted you to discuss the problem.
Do you have any news?
Alban
Not yet I think ! I am then waiting for their input … I have mitigated the issue for now with a combination of having two layers of pg bouncer, to have forest admin be the top prio in terms of number of connections. And I have increased to 20 the pool size from internet
2 Likes
Can you tell me in a little while whether this modification has solved the problem?
I keep having the same errors even with drastic limitations on my other apps. This is so frustrating
I can’t see this error anymore it’s too much
I wanna cry haha
I’m really sorry about this error…
Do you want to try to limit the number of lambdas exec in //?