Contributing to csvkit
**********************

csvkit actively encourages contributions from people of all genders,
races, ethnicities, ages, creeds, nationalities, persuasions,
alignments, sizes, shapes, and journalistic affiliations. You are
welcome here.

We seek contributions from developers and non-developers of all skill
levels. We will typically accept bug fixes and documentation updates
with minimal fuss. If you want to work on a larger feature—great! The
maintainers will be happy to provide feedback and code review on your
implementation.

Before making any changes or additions to csvkit, please be sure to
read the rest of this document, especially the “Principles of
development” section.


Getting Started
===============

Set up your environment for development:

   git clone git://github.com/wireservice/csvkit.git
   cd csvkit
   mkvirtualenv csvkit

   # If running Python 2:
   pip install -r requirements-py2.txt

   # If running Python 3:
   pip install -r requirements-py3.txt

   python setup.py develop
   tox


Principles of development
=========================

csvkit is to tables as Unix text processing commands (cut, grep, cat,
sort) are to text. As such, csvkit adheres to the Unix philosophy.

1. Small is beautiful.

2. Make each program do one thing well.

3. Build a prototype as soon as possible.

4. Choose portability over efficiency.

5. Store data in flat text files.

6. Use software leverage to your advantage.

7. Use shell scripts to increase leverage and portability.

8. Avoid captive user interfaces.

9. Make every program a filter.

As there is no single, standard CSV format, csvkit encourages popular
formatting options:

* Output targets broad compatibility: Quoting is done with double-
  quotes and only when required. Fields are delimited with commas.
  Rows are terminated with Unix line endings (“\n”).

* Output favors consistency over brevity: Numbers always include at
  least one decimal place, even if they are round. Dates and times are
  output in ISO 8601 format. Null values are rendered as empty
  strings.


How to contribute
=================

1. Fork the project on GitHub.

2. Look through the open issues for a task that you can
   realistically expect to complete in a few days. Don’t worry about
   the issue’s priority; instead, choose something you’ll enjoy.
   You’re more likely to finish something if you enjoy hacking on it.

3. Comment on the issue to let people know you’re going to work on
   it so that no one duplicates your effort. It’s good practice to
   provide a general idea of how you plan to resolve the issue so that
   others can make suggestions.

4. Write tests for any changes to the code’s behavior. Follow the
   format of the tests in the "tests/" directory to see how this
   works. You can run all the tests with the command "tox", or just
   your Python version’s with "nosetests" (faster).

5. Write the code. Try to be consistent with the style and
   organization of the existing code. A good contribution won’t be
   refused for stylistic reasons, but large parts of it may be
   rewritten and nobody wants that.

6. As you’re working, periodically merge in changes from the
   upstream master branch to avoid having to resolve large merge
   conflicts. Check that you haven’t broken anything by running the
   tests.

7. Write documentation for any user-facing features.

8. Once it works, is tested, and is documented, submit a pull
   request on GitHub.

9. Wait for it to be merged or for a comment about what needs to be
   changed.

10. Rejoice.


A note on new tools
===================

As a general rule, csvkit is no longer adding new tools. This is the
result of limited maintenance time as well as a desire to keep the
toolkit focused on the most common use cases. Exceptions may be made
to this rule on a case-by-case basis. We, of course, welcome patches
to improve existing tools or add useful features.

If you decide to build your own tool, we encourage you to create and
maintain it as a separate Python package. You will probably want to
use the agate library, which csvkit uses for most of its CSV reading
and writing. Doing so will safe time and make it easier to maintain
common behavior with csvkit’s core tools.


Streaming versus buffering
==========================

csvkit tools operate in one of two fashions: Some, such as csvsort,
buffer their entire input into memory before writing any output. Other
tools—those that can operate on individual records—write write a row
immediately after reading a row. Records are “streamed” through the
tool. Streaming tools produce output faster and require less memory
than buffering tools.

For performance reasons tools should always offer streaming when
possible. If a new feature would undermine streaming functionality it
must be balanced against the utility of having a tool that can
efficiently operate over large datasets.

Currently, the following tools stream:

* csvclean

* csvcut

* csvformat

* csvgrep

* csvstack

* sql2csv

Currently, the following tools buffer:

* csvjoin

* csvjson unless "--stream" and "--no-inference" are set but "--
  skip- lines" is not

* csvlook

* csvpy

* csvsort

* csvsql

* csvstat

* in2csv unless "--format" is "ndjson" and "--no-inference" is set,
  or "--format" is "csv" and "--no-inference" is set but "--skip-
  lines" is not


Legalese
========

To the extent that contributors care, they should keep the following
legal mumbo-jumbo in mind:

The source of csvkit and therefore of any contributions are licensed
under the permissive MIT license. By submitting a patch or pull
request you are agreeing to release your contribution under this
license. You will be acknowledged in the AUTHORS file. As the owner of
your specific contributions you retain the right to privately
relicense your specific contributions (and no others), however, the
released version of the code can never be retracted or relicensed.
