csvkit 2.2.0
************


About
=====

[image: Build status][image][image: Coverage status][image][image:
PyPI downloads][image][image: Version][image][image:
License][image][image: Support Python versions][image]

csvkit is a suite of command-line tools for converting to and working
with CSV, the king of tabular file formats.

It is inspired by pdftk, GDAL and the original csvcut tool by Joe
Germuska and Aaron Bycoffe.

Important links:

* Documentation: https://csvkit.rtfd.org/

* Repository:    https://github.com/wireservice/csvkit

* Issues:        https://github.com/wireservice/csvkit/issues

* Schemas:       https://github.com/wireservice/ffs

First time? See Tutorial.

Note:

  To change the field separator, line terminator, etc. of the
  **output**, you must use csvformat.

Note:

  csvkit, by default, sniffs CSV formats (it deduces whether commas,
  tabs or spaces delimit fields, for example) based on the first 1024
  bytes, and performs type inference (it converts text to numbers,
  dates, booleans, etc.). These features are useful and work well in
  most cases, but occasional errors occur. If you don't need these
  features, set "--snifflimit 0" ("-y 0") and "--no-inference" ("-I").

Note:

  If you need to do more complex data analysis than csvkit can handle,
  use agate. If you need csvkit to be faster or to handle larger
  files, you may be reaching the limits of csvkit. Consider loading
  the data into SQL, or using qsv or xsv.

Note:

  Need to deduplicate or find fuzzy matches in your CSV data? Use
  csvdedupe and csvlink.


Why csvkit?
===========

Because it makes your life easier.

Convert Excel to CSV:

   in2csv data.xls > data.csv

Convert JSON to CSV:

   in2csv data.json > data.csv

Print column names:

   csvcut -n data.csv

Select a subset of columns:

   csvcut -c column_a,column_c data.csv > new.csv

Reorder columns:

   csvcut -c column_c,column_a data.csv > new.csv

Find rows with matching cells:

   csvgrep -c phone_number -r "555-555-\d{4}" data.csv > new.csv

Convert to JSON:

   csvjson data.csv > data.json

Generate summary statistics:

   csvstat data.csv

Query with SQL:

   csvsql --query "select name from data where age > 30" data.csv > new.csv

Import into PostgreSQL:

   csvsql --db postgresql:///database --insert data.csv

Extract data from PostgreSQL:

   sql2csv --db postgresql:///database --query "select * from data" > new.csv

And much more...


Table of contents
=================

* Tutorial

  * 1. Getting started

    * 1.1. About this tutorial

    * 1.2. Installing csvkit

    * 1.3. Getting the data

    * 1.4. in2csv: the Excel killer

    * 1.5. csvlook: data periscope

    * 1.6. csvcut: data scalpel

    * 1.7. Putting it together with pipes

    * 1.8. Summing up

  * 2. Examining the data

    * 2.1. csvstat: statistics without code

    * 2.2. csvgrep: find the data you need

    * 2.3. csvsort: order matters

    * 2.4. Summing up

  * 3. Power tools

    * 3.1. csvjoin: merging related data

    * 3.2. csvstack: combining subsets

    * 3.3. csvsql and sql2csv: ultimate power

    * 3.4. Summing up

  * 4. Going elsewhere with your data

    * 4.1. csvjson: going online

    * 4.2. csvpy: going into code

    * 4.3. csvformat: for legacy systems

    * 4.4. Summing up

* Reference

  * Input

    * in2csv

    * sql2csv

  * Processing

    * csvclean

    * csvcut

    * csvgrep

    * csvjoin

    * csvsort

    * csvstack

  * Output and Analysis

    * csvformat

    * csvjson

    * csvlook

    * csvpy

    * csvsql

    * csvstat

  * Common arguments

    * Arguments common to all tools

* Tips and Troubleshooting

  * Tips

    * Reading compressed CSVs

    * Specifying STDIN as a file

    * Using csvkit in a crontab

  * Troubleshooting

    * Installation

    * CSV formatting and parsing

    * CSV data interpretation

    * Slow performance

    * Database errors

    * Python standard output encoding errors

* Contributing to csvkit

  * Getting started

  * Principles of development

  * How to contribute

  * A note on new tools

  * Streaming versus buffering

  * Legalese

* Release process

* License

* Changelog

  * 2.2.0 - December 15, 2025

  * 2.1.0 - February 26, 2025

  * 2.0.1 - July 12, 2024

  * 2.0.0 - May 1, 2024

  * 1.5.0 - March 28, 2024

  * 1.4.0 - February 13, 2024

  * 1.3.0 - October 18, 2023

  * 1.2.0 - October 4, 2023

  * 1.1.1 - February 22, 2023

  * 1.1.0 - January 3, 2023

  * 1.0.7 - March 6, 2022

  * 1.0.6 - July 13, 2021

  * 1.0.5 - March 2, 2020

  * 1.0.4 - March 16, 2019

  * 1.0.3 - March 11, 2018

  * 1.0.2 - April 28, 2017

  * 1.0.1 - December 29, 2016

  * 1.0.0 - December 27, 2016

  * 0.9.1 - March 31, 2015

  * 0.9.0 - September 8, 2014

  * 0.8.0 - July 27, 2014

  * 0.7.3 - April 27, 2014

  * 0.7.2 - March 24, 2014

  * 0.7.1 - March 24, 2014

  * 0.7.0 - March 24, 2014

  * 0.6.1 - August 20, 2013

  * 0.6.0 - August 20, 2013

  * 0.5.0 - August 21, 2012

  * 0.4.4 - May 1, 2012

  * 0.4.3 - February 20, 2012


Citation
========

When citing csvkit in publications, you may use this BibTeX entry:

   @Manual{csvkit,
     title = "csvkit",
     author = "Christopher Groskopf and contributors",
     year = "2016",
     url = "https://csvkit.readthedocs.org/"
   }


Authors
=======

The following individuals have contributed code to csvkit:

* Christopher Groskopf

* Joe Germuska

* Aaron Bycoffe

* Travis Mehlinger

* Alejandro Companioni

* Benjamin Wilson

* Bryan Silverthorn

* Evan Wheeler

* Matt Bone

* Ryan Pitts

* Hari Dara

* Jeff Larson

* Jim Thaxton

* Miguel Gonzalez

* Anton Ian Sipos

* Gregory Temchenko

* Kevin Schaul

* Marc Abramowitz

* Noah Hoffman

* Jan Schulz

* Derek Wilson

* Chris Rosenthal

* Davide Setti

* Gabi Davar

* Sriram Karra

* James McKinney

* Aaron McMillin

* Matt Dudys

* Joakim Lundborg

* Federico Scrinzi

* Shane StClair

* raistlin7447

* Alex Dergachev

* Jeff Paine

* Jeroen Janssens

* Sébastien Fievet

* Travis Swicegood

* Ryan Murphy

* Diego Rabatone Oliveira

* Matt Pettis

* Tasneem Raja

* Richard Low

* Kristina Durivage

* Espartaco Palma

* pnaimoli

* Michael Mior

* Jennifer Smith

* Antonio Lima

* Dave Stanton

* Pedrow

* Neal McBurnett

* Anthony DeBarros

* Baptiste Mispelon

* James Seppi

* Karrie Kehoe

* Geert Barentsen

* Cathy Deng

* Eric Bréchemier

* Neil Freeman

* Fede Isas

* Patricia Lipp

* Kev++

* edwardros

* Martin Burch

* Pedro Silva

* hydrosIII

* Tim Wisniewski

* Santiago Castro

* Dan Davison

* Éric Araujo

* Sam Stuck

* Edward Betts

* Jake Zimmerman

* Bryan Rankin

* Przemek Wesołek

* Karl Fogel

* sterlingpetersen

* kjedamzik

* John Vandenberg

* Olivier Lacan

* Adrien Delessert

* Ghislain Antony Vaillant

* Forest Gregg

* Aliaksei Urbanski

* Reid Beels

* Rodrigo Lemos

* Victor Noagbodji

* Connor McArthur

* Matěj Cepl

* Nicholas Matteo

* Matt Giguere

* Felix Bünemann

* Andriy Orehov (Андрій Орєхов)

* Dan Nguyen

* 谭九鼎

* Tomáš Hrnčiar

* Christopher Bottoms

* panolens

* Gabe Walker

* Gui13

* Danny Sepler

* Christian Clauss

* Bonifacio de Oliveira

* Ryan Grout

* badbunnyyy

* Werner Robitza

* Mark Mayo

* Kitagawa Yasutaka

* rachekalmir

* Tim Vergenz

* sgpeter1

* Wes Dean

* Álvaro Osvaldo

* lamdevhs


Indices and tables
==================

* Index

* Module Index

* Search Page
