๐Ÿ›‘ We observe some latencies and timeouts on our API services. ๐Ÿ›‘

Hi all

Weโ€™re experiencing some issue with our server :stop_sign:

Please know our engineering and operations teams are working hard to get everything up and running, and we will update you right here with the latest information

Weโ€™re sorry for the inconvenience :pray:

1 Like

The service is back to normal since 11:10am this morning :white_check_mark:

This morning, our API has been really slow and instable and as a consequence, the Forest Admin users experience has been deteriorated.

From a technical point of view, we observed very high latencies and server web instances crashes during 2 hours from 9:00am to 11:10am.

Weโ€™ve taken some initiatives to improve our performance, and so far and the results on memory consumption looks promising.

Sorry again for the inconvenience!

Hi @louis
Iโ€™m afraid itโ€™s happening again. I opened this topic a few minutes ago.

Weโ€™re stuck.
:pray:

Hi @Matteo

:orange_circle: It looks like our server instabilities are back again this morning :disappointed: Weโ€™re doing our best to keep it up (with manual restarts), monitor and find out the origin(s) of this.

Some users may experience difficulties in accessing the platform.

Weโ€™ll keep you updated once the service is back to normal and give you a clear update on the situation.

1 Like

@Matteo

:green_circle: End of the instabilities. The incriminated commit has been rollbacked and our monitoring system indicate that everything is back to normal.

Weโ€™ll write a post mortem about the recent incident now that we are aware of the cause of yesterday and todays issue.

Sorry for the inconvenience

2 Likes

Hi all,

Here is a quick report on the situation. :newspaper_roll:

On june 11th, between 9:00am and 10:40am, our API has been really slow and instable.
As a consequence, the Forest Admin users experience has been deteriorated by latencies or service errors.

From a technical point of view, we observed very high response time, memory increases, errors and server web instances crashes.

On wednesday, June 9th, we released a feature that introduced the role information into user session tokens. It appears that this contribution modified a specific SQL query used by our services during agent session creation. This new query added more joins and this appeared to be very very slow for specific projects.

We rollbacked the specific commit and the latencies and errors instantly disappeared.

Weโ€™re working internally to prevent similar issues in the future and are deeply sorry for the inconvenience :pray:

Thanks.

3 Likes