Click Technology

Linux, Windows, Mac it's all good

CSV mish mash

November30

I’ve just found a fantastic new set of tools for handling CSV data.  CSVkit allows you to chop, change, splice and cut any csv file you like without all the Excel style overhead. All command line driven on Linux, this is a really fast tool for preparing CSV data for upload to a database.  Perhaps even cooler, once you have your data nicely prepared, you can even generate the SQL statements to create the database on your database server.  Very very sweet and massively time saving too.

As it’s a python too, get these in first..

sudo apt-get install python-dev python-pip python-setuptools build-essential

Then install with pip

sudo pip install --upgrade setuptools
sudo pip install --upgrade csvkit

I just downloaded a huge excel spreadsheet from the LSE with the list of all company names and details on the exchange.

Let’s convert that Excel file into csv

in2csv list-of-all-companies.xls > list-of-all-companies.csv

Nice. The first six or so lines are fluff, so let’s remove them

sed -e '1,6d' < list-of-all-companies.csv > list-of-all-companies-trimmed.csv

OK, great. Now there are a lot more things you could do, such as select the columns you want etc, but I’m just going to create the MySQL table for the CSV file. It can convert to any of these database formats – access, sybase, sqlite, informix, firebird, mysql, oracle, maxdb, postgresql & mssql. Sick.

Right, let’s create the table.

csvsql -i mysql list-of-all-companies-trimmed.csv

produces..

CREATE TABLE `list-of-all-companies-trimmed` (
	`List Date` FLOAT NOT NULL, 
	`Company` VARCHAR(35) NOT NULL, 
	`Group` FLOAT, 
	`Sector` VARCHAR(38) NOT NULL, 
	`Sub Sector` VARCHAR(39) NOT NULL, 
	`Country of Incorporation` VARCHAR(2) NOT NULL, 
	`Market` VARCHAR(11) NOT NULL, 
	`Mkt Cap £m` FLOAT, 
	`International Main Market` BOOL NOT NULL, 
	`FCA Listing Category` VARCHAR(8), 
	CHECK (`International Main Market` IN (0, 1))
);

Cut and paste this to your favourite SQL editor and execute on the server and voilà, instant data table. You can also have csvsql upload the table and data direct to the server too. Crazy sweet. A brilliant tool.

Time to read the documentation and tutorial for more info. Worth every minute.

posted under Linux Tips | Comments Off on CSV mish mash

This is my website for short blog posts and interesting materials that are noteworthy or have some handy-tip value.