CSV mish mash
I’ve just found a fantastic new set of tools for handling CSV data. CSVkit allows you to chop, change, splice and cut any csv file you like without all the Excel style overhead. All command line driven on Linux, this is a really fast tool for preparing CSV data for upload to a database. Perhaps even cooler, once you have your data nicely prepared, you can even generate the SQL statements to create the database on your database server. Very very sweet and massively time saving too.
As it’s a python too, get these in first..
sudo apt-get install python-dev python-pip python-setuptools build-essential
Then install with pip
sudo pip install --upgrade setuptools
sudo pip install --upgrade csvkit
I just downloaded a huge excel spreadsheet from the LSE with the list of all company names and details on the exchange.
Let’s convert that Excel file into csv
in2csv list-of-all-companies.xls > list-of-all-companies.csv
Nice. The first six or so lines are fluff, so let’s remove them
sed -e '1,6d' < list-of-all-companies.csv > list-of-all-companies-trimmed.csv
OK, great. Now there are a lot more things you could do, such as select the columns you want etc, but I’m just going to create the MySQL table for the CSV file. It can convert to any of these database formats – access, sybase, sqlite, informix, firebird, mysql, oracle, maxdb, postgresql & mssql. Sick.
Right, let’s create the table.
csvsql -i mysql list-of-all-companies-trimmed.csv
produces..
CREATE TABLE `list-of-all-companies-trimmed` ( `List Date` FLOAT NOT NULL, `Company` VARCHAR(35) NOT NULL, `Group` FLOAT, `Sector` VARCHAR(38) NOT NULL, `Sub Sector` VARCHAR(39) NOT NULL, `Country of Incorporation` VARCHAR(2) NOT NULL, `Market` VARCHAR(11) NOT NULL, `Mkt Cap £m` FLOAT, `International Main Market` BOOL NOT NULL, `FCA Listing Category` VARCHAR(8), CHECK (`International Main Market` IN (0, 1)) );
Cut and paste this to your favourite SQL editor and execute on the server and voilà, instant data table. You can also have csvsql upload the table and data direct to the server too. Crazy sweet. A brilliant tool.
Time to read the documentation and tutorial for more info. Worth every minute.