Loading Prices Paid Data into PostgresSQL Server on Fedora

If you want to analyse the housing market in England and Wales you can use the Prices Paid data from https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads

Once you have the big file downloaded you need to upload it to your database. Put the file into the shared folder where you have allowed the postgres user and your current user to have access to the files. Mine is a folder in root called Database_Files. CD into the data directory and check out the file to see if it contains column names:

head pp-complete.txt

A cursory inspection reveals no headers.

Check out https://www.gov.uk/guidance/about-the-price-paid-data#explanations-of-column-headers-in-the-ppd for the headers

You’ll have to create the table before hand.

CREATE DATABASE prices_paid;

CREATE TABLE prices_paid.public.price_paid(
   Transaction_unique_identifier CHAR(250) PRIMARY KEY NOT NULL,
   PRICE INT    NOT NULL,
   Date_of_transfer  timestamp NOT NULL,
   postcode  CHAR(50),
   Property_type CHAR(1),
   old_new CHAR(1),
   duration CHAR(1),
   paon text,
   saon text,
   street text,
   locality text,
   town_city text,
   district text,
   county text,
   ppd_category char(1),
   record_status char(1)
);

I use DBeaver COMMUNITY for a nice SSMS like experience. Log into your database using a client and run the following SQL

COPY prices_paid.public.price_paid 
FROM '/Database_Files/pp-complete.csv'
quote '"'
 CSV
;

On my laptop it took 42 minutes. Which I think is too long. I then indexed the postcode field as that’s the one I would probably use most.

create index idxpostcode on prices_paid.public.price_paid (postcode);