Discussion:
Postgresql goes down need to restart (redhat postgresql service script) lock files removal avoid 2 postmasters
mlaks
2003-05-08 16:50:49 UTC
Permalink
Hi,
Thank you Tom.
I have been looking at the postgresql service startup scripts in Redhat
written by Lamar Owen et al.

I would like to understand what is the role of the following files that are
created during startup of Postgresql on my Redhat Linux box.

1. /var/lib/pgsql/data/postmaster.pid

2. /var/run/postmaster.pid (for redhat 7.3/Postgresql 7.2)
/var/run/postmaster5432.pid (for redhat 9.0/Postgresql 7.3.2)

3. /var/lock/subsys/postgresql

and

4. /tmp/.s.PGSQL.5432.lock (and associated link to the directory in that
directory).

I notice that the file

1. /var/lib/pgsql/data/postmaster.pid contains the pid of the
/usr/bin/postmaster process. Interestingly Lamar does not rm this file on
stop().

2. /var/run/postmaster.pid contains the pid of a postgres stats process

3. the /tmp/.s.PGSQL.5432.lock file has the pid of the /usr/bin/postmaster
process.

Why do I care?

My goal is to use DJ Bernsteins daemonstools to make sure that my Postgresql
process goes back up unattended if it goes down for some reason. So I will be
substituting daemontools for the postgresql service script.
Thus I want to know what lock files to remove to make sure all is ok. I also
want to follow Tom Lanes's advice and not shoot myself in the foot by
creating two different postmaster processes working the same database!!!!

Thank you all for your help!!!

Mitchell Laks


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org
Bruno Wolff III
2003-05-08 17:50:21 UTC
Permalink
On Thu, May 08, 2003 at 12:50:49 -0400,
Post by mlaks
My goal is to use DJ Bernsteins daemonstools to make sure that my Postgresql
process goes back up unattended if it goes down for some reason. So I will be
substituting daemontools for the postgresql service script.
Thus I want to know what lock files to remove to make sure all is ok. I also
want to follow Tom Lanes's advice and not shoot myself in the foot by
creating two different postmaster processes working the same database!!!!
This is what I put in my run file:
#!/bin/sh
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data

I use multilog for logging as you normally would.


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
mlaks
2003-05-08 18:10:52 UTC
Permalink
Thank you for your response Bruno. I agree about the importance of using the
lines

#!/bin/sh
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
/usr/local/pgsql/data

in the run file. However, what else must we put in as well?

My question is to understand the lock files for postgresql so I can deal with
the following:

1.

I notice that Lamar's postgresql service script removes "stale lock files"
before starting postgresql by using the line

rm -f /tmp/.s.PGSQL.* > /dev/null

and perhaps my own experience indicates we also should add a line

rm -f /var/lib/pgsql/data/postmaster.pid

because sometimes when my machine crashes and gets rebooted I must manually
remove that file.

2.

Moreover, I see that after successfully starting postgresql Lamar touches a
file

touch /var/lock/subsys/postgresql

and does this

echo $pid > /var/run/postmaster.pid

so how can we do that?

3.

I can imagine we can accomplish 1. with

#!/bin/sh
rm -f /tmp/.s.PGSQL.* > /dev/null
rm -f /var/lib/pgsql/data/postmaster.pid
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
/usr/local/pgsql/data

but how do we do 2. -> the touching and echoing after the process starts if we
have replaced the "run" process by the postmaster process with the exec so
that the daemontools "svc" can control the process?



Mitchell
Post by Bruno Wolff III
On Thu, May 08, 2003 at 12:50:49 -0400,
Post by mlaks
My goal is to use DJ Bernsteins daemonstools to make sure that my
Postgresql process goes back up unattended if it goes down for some
reason. So I will be substituting daemontools for the postgresql service
script.
Thus I want to know what lock files to remove to make sure all is ok. I
also want to follow Tom Lanes's advice and not shoot myself in the foot
by creating two different postmaster processes working the same
database!!!!
#!/bin/sh
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
/usr/local/pgsql/data
I use multilog for logging as you normally would.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html
Bruno Wolff III
2003-05-08 18:40:33 UTC
Permalink
On Thu, May 08, 2003 at 14:10:52 -0400,
Post by mlaks
Thank you for your response Bruno. I agree about the importance of using the
lines
#!/bin/sh
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
/usr/local/pgsql/data
in the run file. However, what else must we put in as well?
My question is to understand the lock files for postgresql so I can deal with
Some of the lock files have to do with the init system. Those can be
ignored. Postgres also keeps a lock file and that is used to prevent
two postmasters from running at the same time. You probably don't want
to have a script remove that lock file, because if there really is
another postmaster running, starting a second one can be a disaster.


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)
mlaks
2003-05-08 18:39:08 UTC
Permalink
Bruno,
Thanks for your help. I was wondering:

Should we in fact be execing the postmaster as you describe or perhaps pg_ctl
as Lamar's script uses or perhaps
a "new" script that incorporates pg_ctl or postmaster and a signal catching
mechanism. The reason I ask is that

the way that daemontools stops a service - if you want it to - is via the
command

svc opts postgresl :

with opts

-d: Down. If the service is running, send it a TERM signal and then a CONT
signal. After it stops, do not restart it.
-o: Once. If the service is not running, start it. Do not restart it if it
stops.
-p: Pause. Send the service a STOP signal.
-c: Continue. Send the service a CONT signal.
-h: Hangup. Send the service a HUP signal.
-a: Alarm. Send the service an ALRM signal.
-i: Interrupt. Send the service an INT signal.
-t: Terminate. Send the service a TERM signal.
-k: Kill. Send the service a KILL signal.

now we would not want to kill the postmaster, of course. But should we even be
TERM'ing the postmaster? I dont know. What do the Postgresql Gurus say?

Moreover, if we agree that we need to imbed pg_ctl or postmaster in a script
to handle the above things, it should be doable to handle all of the assorted
other files if they are neccesary to handle .

Do you agree?

Also what would be the problem in checking for the existence of a postmaster
and if none exists then killing the lock files.

My main problem is that I have machines that get creamed by power surges, and
then wont restart postgresql on reboot of the system because of the damn lock
files. I really want to deal with them up front!

MOreover can you tell me more about what init uses the locks for?

what is the role of the files

/var/run/postmaster.pid
/var/lock/subsys/postgresql

that Lamar carefully adds and subtracts?


rm -f /var/run/postmaster.pid
rm -f /var/lock/subsys/postgresql

Thanks
Mitchell
Post by Bruno Wolff III
On Thu, May 08, 2003 at 14:10:52 -0400,
Post by mlaks
Thank you for your response Bruno. I agree about the importance of using
the lines
#!/bin/sh
exec 2>&1
exec setuidgid postgres /usr/local/pgsql/bin/postmaster -D
/usr/local/pgsql/data
in the run file. However, what else must we put in as well?
My question is to understand the lock files for postgresql so I can deal
Some of the lock files have to do with the init system. Those can be
ignored. Postgres also keeps a lock file and that is used to prevent
two postmasters from running at the same time. You probably don't want
to have a script remove that lock file, because if there really is
another postmaster running, starting a second one can be a disaster.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Bruno Wolff III
2003-05-08 19:48:42 UTC
Permalink
On Thu, May 08, 2003 at 14:39:08 -0400,
Post by mlaks
now we would not want to kill the postmaster, of course. But should we even be
TERM'ing the postmaster? I dont know. What do the Postgresql Gurus say?
I regularly use svc -d to shutdown postmaster and svc -u to restart it.
This works just fine.
Post by mlaks
Moreover, if we agree that we need to imbed pg_ctl or postmaster in a script
to handle the above things, it should be doable to handle all of the assorted
other files if they are neccesary to handle .
You don't have to do that.
Post by mlaks
Also what would be the problem in checking for the existence of a postmaster
and if none exists then killing the lock files.
I would be very leary of putting this in a script. postmaster already does
this and trying to be smarter than it might cause you a lot of grief.
Post by mlaks
My main problem is that I have machines that get creamed by power surges, and
then wont restart postgresql on reboot of the system because of the damn lock
files. I really want to deal with them up front!
Most of the time when I have unscheduled shutdowns postgres comes up without
problem. I don't remember if I have had any since I switched to using
supervise though. I have had more issues with someone needing to answer
a question from fsck from the console than postgresql not coming up.
Post by mlaks
MOreover can you tell me more about what init uses the locks for?
To tell if the service is already running or not.
Post by mlaks
what is the role of the files
/var/run/postmaster.pid
/var/lock/subsys/postgresql
that Lamar carefully adds and subtracts?
I don't know exactly, but I would expect that the pid file is a lock for
the service and that the subsys file is a lock to keep two init scripts
from running for the same time for the same service.


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Bruno Wolff III
2003-05-09 00:49:47 UTC
Permalink
On Thu, May 08, 2003 at 16:30:11 -0400,
Post by mlaks
Bruno, Thanks for your help.
i checked - grep in the /etc/rc.d/init.d agrees with what you said - those
/var/lock and /var/run files are commonly placed in all of the services!
I had 4 out of 5 machines that got creamed this weekend, and all i needed to
go in for was to erase that file /var/lib/pgsql/data/postmaster.pid.
the same thing!!! (with only one machine) happened about a month ago.
I notice that in his script Lamar does this
pid=`pidof -s postmaster`
if [ $pid ]
then
echo $"Postmaster already running."
else
#all systems go -- remove any stale lock files
rm -f /tmp/.s.PGSQL.* > /dev/null
then he starts up.
What I would be doing is simply adding in
rm -f /var/lib/pgsql/data/postmaster.pid line.
It looks like he isnt worried about getting rid of that tmp/.s.PGSQL.* file as
long as he ran pidof first -
(is /tmp/.s.PGSQL. also a kind of lock file? i dont know - do you know
what system sets it up?)
Well if there is no process with the pid in postmaster.pid then you are safe.
If there is one then you have to know it isn't a postmaster.
Post by mlaks
Also - what do you do about those files
/tmp/.s.PGSQL.* ?
These are place holders for the domain sockets used for local connections.
Post by mlaks
and what do you do about the possibility of supervise starting more than one
of the postmasters?
I do this. It is simpler to set up than making a bunch of different init
scripts. Just make sure each postmaster uses a different port and data
area.
Post by mlaks
I like the idea of supervise starting me up again even without a reboot! and i
just want to catch this problem and solve it.
Thanks, mitchell
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ***@postgresql.org)
Bruno Wolff III
2003-05-08 19:41:31 UTC
Permalink
On Thu, May 08, 2003 at 14:57:15 -0400,
1. check for a running postmaster
2 if not delete the /var/lib/pgsql/data/postmaster.pid files
where would we go wrong with duplicate postmistresses?
postmaster already does that, but there may be cases where it thinks there
is a running postmater and there really isn't. In that case you would
need to verify this, remove the lock file and start by hand.

Having two postmasters running at the same time for the same data
directory will corrupt your databases.


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
mlaks
2003-05-08 20:33:08 UTC
Permalink
Bruno, Thanks for your help.

i checked - grep in the /etc/rc.d/init.d agrees with what you said - those
/var/lock and /var/run files are commonly placed in all of the services!

Here's my problem:

I had 4 out of 5 machines that got creamed this weekend, and all i needed to
go in for was to erase that file /var/lib/pgsql/data/postmaster.pid.
the same thing!!! (with only one machine) happened about a month ago.

I notice that in his script Lamar does this

pid=`pidof -s postmaster`
if [ $pid ]
then
echo $"Postmaster already running."
else
#all systems go -- remove any stale lock files
rm -f /tmp/.s.PGSQL.* > /dev/null
then he starts up pg_ctl.

What I would be doing is simply adding in

rm -f /var/lib/pgsql/data/postmaster.pid line.

It looks like he isnt worried about getting rid of that tmp/.s.PGSQL.* file
as long as he ran pidof first -
(is /tmp/.s.PGSQL. also a kind of lock file? i dont know - do you know
what system sets it up?)

Also - what do you do about those files

/tmp/.s.PGSQL.* ?

and what do you do about the possibility of supervise starting more than one
of the postmasters?

I like the idea of supervise starting me up again even without a reboot! and
i just want to catch this problem and solve it.

Thanks, mitchell
Post by Bruno Wolff III
On Thu, May 08, 2003 at 14:39:08 -0400,
Post by mlaks
now we would not want to kill the postmaster, of course. But should we
even be TERM'ing the postmaster? I dont know. What do the Postgresql
Gurus say?
I regularly use svc -d to shutdown postmaster and svc -u to restart it.
This works just fine.
Post by mlaks
Moreover, if we agree that we need to imbed pg_ctl or postmaster in a
script to handle the above things, it should be doable to handle all of
the assorted other files if they are neccesary to handle .
You don't have to do that.
Post by mlaks
Also what would be the problem in checking for the existence of a
postmaster and if none exists then killing the lock files.
I would be very leary of putting this in a script. postmaster already does
this and trying to be smarter than it might cause you a lot of grief.
Post by mlaks
My main problem is that I have machines that get creamed by power surges,
and then wont restart postgresql on reboot of the system because of the
damn lock files. I really want to deal with them up front!
Most of the time when I have unscheduled shutdowns postgres comes up
without problem. I don't remember if I have had any since I switched to
using supervise though. I have had more issues with someone needing to
answer a question from fsck from the console than postgresql not coming up.
Post by mlaks
MOreover can you tell me more about what init uses the locks for?
To tell if the service is already running or not.
Post by mlaks
what is the role of the files
/var/run/postmaster.pid
/var/lock/subsys/postgresql
that Lamar carefully adds and subtracts?
I don't know exactly, but I would expect that the pid file is a lock for
the service and that the subsys file is a lock to keep two init scripts
from running for the same time for the same service.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
-------------------------------------------------------


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Loading...