pg_dumpall and restore

Discussion:

Rossi, Maria

2018-10-09 17:18:37 UTC

Hi,

I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?

Your help would be much appreciated.

Thanks.
Maria

Keith

2018-10-09 17:58:17 UTC

Permalink

Post by Rossi, Maria
Hi,
I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall
then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it
was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused
this and how to prevent?
Your help would be much appreciated.
Thanks.
Maria

Did you use the pg_dumpall binary from 10.5 to create the dump?

Keith

Rossi, Maria

2018-10-09 18:05:51 UTC

Permalink

Yes, I did use the V10.5 pg_dumpall

Maria A Rossi

From: Keith <***@keithf4.com>
Sent: Tuesday, October 9, 2018 1:58 PM
To: Rossi, Maria <***@jackson.com>
Cc: pgsql-***@lists.postgresql.org; pgsql-***@lists.postgresql.org
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL

On Tue, Oct 9, 2018 at 1:19 PM Rossi, Maria <***@jackson.com<mailto:***@jackson.com>> wrote:
Hi,

I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?

Your help would be much appreciated.

Thanks.
Maria

Did you use the pg_dumpall binary from 10.5 to create the dump?

Keith

Laurenz Albe

2018-10-09 18:28:25 UTC

Permalink

Post by Rossi, Maria
I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?
Your help would be much appreciated.

I don't believe that pg_dumpall miraculously duplicated the row.

You probably *do* have a duplicate row, and hence table corruption,
but I suspect that one of the rows is not in the index you used
to look for the row.

If you query:

SELECT * FROM tab WHERE id = 42;

the query will likely use the index on "id" and find only one of
the rows.

You should

SET enable_indexscan = off;
SET enable_indexonlyscan = off;

and then repeat the query, so that a sequential scan is used.

To fix, delete one of the rows and reindex.

You can identify a row by its tuple id:

SELECT ctid, * FROM tab WHERE id = 42;

Yours,
Laurenz Albe

--
Cybertec | https://www.cybertec-postgresql.com

Tom Lane

2018-10-09 18:32:02 UTC

Permalink

Post by Laurenz Albe

Post by Rossi, Maria
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?

I don't believe that pg_dumpall miraculously duplicated the row.

What seems much more likely is that the dump script was loaded twice.
The script expects to be fed into an empty database ...

regards, tom lane

Rossi, Maria

2018-10-09 20:05:59 UTC

Permalink

The table has only 2 columns, name and value.
Select count(*) from table1 at the old database returned 115, on the new database, it returned 117. This a simple select without any WHERE clause.
Thanks.

Maria

-----Original Message-----
From: Laurenz Albe <***@cybertec.at>
Sent: Tuesday, October 9, 2018 2:28 PM
To: Rossi, Maria <***@jackson.com>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org>
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL

I don't believe that pg_dumpall miraculously duplicated the row.

You probably *do* have a duplicate row, and hence table corruption, but I suspect that one of the rows is not in the index you used to look for the row.

If you query:

SELECT * FROM tab WHERE id = 42;

the query will likely use the index on "id" and find only one of the rows.

You should

SET enable_indexscan = off;
SET enable_indexonlyscan = off;

and then repeat the query, so that a sequential scan is used.

To fix, delete one of the rows and reindex.

You can identify a row by its tuple id:

SELECT ctid, * FROM tab WHERE id = 42;

Yours,
Laurenz Albe

--
Cy

Stanton Schmidt

2018-10-10 11:51:30 UTC

Permalink

When this happened to me the only explanation I could find was that the original table/index had been corrupted.
The only way to "fix" things was to create a copy of the table and insert select distinct * from orig_table.

Hope this helps.

Stanton

From: "Rossi, Maria" <***@jackson.com>
To: "Laurenz Albe" <***@cybertec.at>, "pgsql-sql" <pgsql-***@lists.postgresql.org>, "pgsql-***@lists.postgresql.org" <pgsql-***@lists.postgresql.org>
Sent: Tuesday, October 9, 2018 3:05:59 PM
Subject: RE: pg_dumpall and restore

The table has only 2 columns, name and value.
Select count(*) from table1 at the old database returned 115, on the new database, it returned 117. This a simple select without any WHERE clause.
Thanks.

Maria

-----Original Message-----
From: Laurenz Albe <***@cybertec.at>
Sent: Tuesday, October 9, 2018 2:28 PM
To: Rossi, Maria <***@jackson.com>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org>
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL

I upgraded our postgres database from V9.3 to V10.5. Used pg_dumpall then restore it to the new instance.
After the restore, we notice that 1 table had duplicate rows, such that it was not able to create the primary key.
I checked the old database, it does not have the dups.
Has anyone encountered having dups rows loaded? Any idea what caused this and how to prevent?
Your help would be much appreciated.

I don't believe that pg_dumpall miraculously duplicated the row.

You probably *do* have a duplicate row, and hence table corruption, but I suspect that one of the rows is not in the index you used to look for the row.

If you query:

SELECT * FROM tab WHERE id = 42;

the query will likely use the index on "id" and find only one of the rows.

You should

SET enable_indexscan = off;
SET enable_indexonlyscan = off;

and then repeat the query, so that a sequential scan is used.

To fix, delete one of the rows and reindex.

You can identify a row by its tuple id:

SELECT ctid, * FROM tab WHERE id = 42;

Yours,
Laurenz Albe

--
Cybertec | https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cybertec-2Dpostgresql.com&d=DwIGaQ&c=eLkx2stMcShI0L6xvxICXHnFB9zmDvZmvnhsCd8Gf8M&r=EWbtHQpXIg2XbQduIylzyXrAFdbjWaMyy-p_LkyfjhQ&m=Oyq62lylPJc2GMn32y1LP8A7tG0QlTVCZtZ4jE4zwIc&s=wTd0yL8L_97S1EgPJeIdgkZwFsc_6lWGn7qd3XrBoNY&e=

Rossi, Maria

2018-10-10 12:01:55 UTC

Permalink

Thanks to all the responses. To fix:

1. At the source, pg_dump the 1 table only
2. Drop table at the target
3. Restore at the target

Also saw this: https://www.postgresql-archive.org/Duplicate-rows-during-pg-dump-td5871316.html
But I did not have to drop/recreate index at the source.
Thanks.

Maria A Rossi

From: Stanton Schmidt <***@rgllogistics.com>
Sent: Wednesday, October 10, 2018 7:52 AM
To: Rossi, Maria <***@jackson.com>
Cc: Laurenz Albe <***@cybertec.at>; pgsql-sql <pgsql-***@lists.postgresql.org>; pgsql-***@lists.postgresql.org
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL

When this happened to me the only explanation I could find was that the original table/index had been corrupted.
The only way to "fix" things was to create a copy of the table and insert select distinct * from orig_table.

Hope this helps.

Stanton

________________________________
From: "Rossi, Maria" <***@jackson.com<mailto:***@jackson.com>>
To: "Laurenz Albe" <***@cybertec.at<mailto:***@cybertec.at>>, "pgsql-sql" <pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>>, "pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>" <pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>>
Sent: Tuesday, October 9, 2018 3:05:59 PM
Subject: RE: pg_dumpall and restore

The table has only 2 columns, name and value.
Select count(*) from table1 at the old database returned 115, on the new database, it returned 117. This a simple select without any WHERE clause.
Thanks.

Maria

-----Original Message-----
From: Laurenz Albe <***@cybertec.at<mailto:***@cybertec.at>>
Sent: Tuesday, October 9, 2018 2:28 PM
To: Rossi, Maria <***@jackson.com<mailto:***@jackson.com>>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>>; 'pgsql-***@lists.postgresql.org' <pgsql-***@lists.postgresql.org<mailto:pgsql-***@lists.postgresql.org>>
Subject: Re: pg_dumpall and restore

EXTERNAL EMAIL

I don't believe that pg_dumpall miraculously duplicated the row.

You probably *do* have a duplicate row, and hence table corruption, but I suspect that one of the rows is not in the index you used to look for the row.

If you query:

SELECT * FROM tab WHERE id = 42;

the query will likely use the index on "id" and find only one of the rows.

You should

SET enable_indexscan = off;
SET enable_indexonlyscan = off;

and then repeat the query, so that a sequential scan is used.

To fix, delete one of the rows and reindex.

You can identify a row by its tuple id:

SELECT ctid, * FROM tab WHERE id = 42;

Yours,
Laurenz Albe