Opened 19 months ago

Last modified 19 months ago

#100 new defect

crin.org site down

Reported by: chris Owned by: chris
Priority: blocker Milestone: Maintenance
Component: crin2 Version:
Keywords: Cc: russell
Estimated Number of Hours: 0 Add Hours to Ticket: 0
Billable?: yes Total Hours: 0.25

Description

Russell, what's up? I'm really sorry I have to go out in 5 mins so I'm not around to help, what has gone wrong?

Change History (5)

comment:1 Changed 19 months ago by chris

  • Add Hours to Ticket changed from 0 to 0.25
  • Total Hours set to 0.25

I see it is back up, I guess it was a code update, next time if you could warn me in advance, via a ticket here, it would be appreciated and it would save me from going grey so fast!

comment:2 Changed 19 months ago by russell

Hi Chris,

Thanks for your mail.

It wasn't planned, not a code update, made me go grey too!

Russell



On 18/02/2017 11:49, CRIN Trac wrote:
> #100: crin.org site down
> -------------------------------------+-----------------------------------
>                   Reporter:  chris    |                Owner:  chris
>                       Type:  defect   |               Status:  new
>                   Priority:  blocker  |            Milestone:  Maintenance
>                  Component:  crin2    |              Version:
>                 Resolution:           |             Keywords:
> Estimated Number of Hours:  0        |  Add Hours to Ticket:  0.25
>                  Billable?:  1        |          Total Hours:  0
> -------------------------------------+-----------------------------------
> Changes (by chris):
>
>   * hours:  0 => 0.25
>   * totalhours:   => 0.25
>
>
> Comment:
>
>   I see it is back up, I guess it was a code update, next time if you could
>   warn me in advance, via a ticket here, it would be appreciated and it
>   would save me from going grey so fast!
>
> --
> Ticket URL: <https://trac.crin.org.archived.website/trac/ticket/100#comment:1>
> CRIN Trac <https://trac.crin.org.archived.website/trac>
> Trac project for CRIN website and servers.

comment:3 Changed 19 months ago by russell

We lost live, I ran drush deploy on live to check code & restart php and memcache. That got drupal back. I then had to do drush cc all to get the site back.

I'm watching the munins now, things seem to be stabilising.

comment:4 Changed 19 months ago by russell

Live went down again, I got it back the same way:
Restarting php-fpm and Memcache, then drush cc all (rebuild caches)

I think what's happening is we're hitting max connections on the DB server.

I don't know why this is happening, my impression is that it's live hitting the DB server too hard, but I can't rule out that it's related to what I'm doing on the dev. server so I'm going to leave off dev. here for the weekend and see if live stabilises.

comment:5 Changed 19 months ago by russell

I've been monitoring for a while now and things look stable. Looking back at the munins my hunch now is that I have been hitting a DB limit from dev. - causing the DB to refuse connections from live on the basis of too many connections. I suspect it's me restarting memcache that's cleared it both times.

Note: See TracTickets for help on using tickets.