SmarterCSV

SmarterCSV is a high-performance CSV reader and writer for Ruby focused on fastest end-to-end CSV ingestion — not just parsing.

Beyond raw speed, SmarterCSV is designed to provide a significantly more convenient and developer-friendly interface than traditional CSV libraries. Instead of returning raw arrays that require substantial post-processing, SmarterCSV produces Rails-ready hashes for each row, making the data immediately usable with ActiveRecord, Sidekiq pipelines, parallel processing, and JSON-based workflows such as S3.

The library includes intelligent defaults, automatic detection of column and row separators, and flexible header/value transformations. These features eliminate much of the boilerplate typically required when working with CSV data and help keep ingestion code concise and maintainable.

For large files, SmarterCSV supports both chunked processing (arrays of hashes) and streaming via Enumerable APIs, enabling efficient batch jobs and low-memory pipelines. The C acceleration further optimizes the full ingestion path — including parsing, hash construction, and conversions — so performance gains reflect real-world workloads, not just tokenizer benchmarks.

The interface is intentionally designed to robustly handle messy real-world CSV while keeping application code clean. Developers can easily map headers, skip unwanted rows, quarantine problematic data, and transform values on the fly without building custom post-processing pipelines.

When exporting data, SmarterCSV converts arrays of hashes back into properly formatted CSV, maintaining the same focus on convenience and correctness.

User Testimonial:

“Best gem for CSV for us yet. […] taking an import process from 7+ hours to about 3 minutes. […] SmarterCSV was a big part and helped clean up our code A LOT.”

Performance

SmarterCSV is designed for real-world CSV processing, returning fully usable hashes with symbol keys and type conversions — not raw arrays that require additional post-processing.

Beware of benchmarks that only measure raw CSV parsing. Such comparisons measure tokenization alone, while real-world usage requires hash construction, key normalization, type conversion, and edge-case handling. Omitting this work understates the actual cost of CSV ingestion.

For a fair comparison, CSV.table is the closest Ruby CSV equivalent to SmarterCSV.

Comparison	Range
vs SmarterCSV 1.14.4 (with acceleration)	5.4× to 37.4x faster
vs SmarterCSV 1.14.4 (pure Ruby)	1.4× to 9.5× faster
vs CSV.read (arrays of arrays)	1.6x to 7.2x faster
vs CSV.table (arrays of hashes)	6.0× to 113.0× faster
vs ZSV (arrays of hashes)	1.4× to 6.3× faster

SmarterCSV also wins 14 of 16 benchmark files head-to-head against ZSV+wrapper (SIMD-accelerated C parser with Ruby wrapper to produce equivalent hash output).

Benchmarks: 16 CSV files (43k–80k rows), Ruby 3.4.7, Apple M1. Memory: 39% less allocated, 43% fewer objects.

See SmarterCSV 1.15.2: Faster Than Raw CSV Arrays and PR #319 for more details.

Examples

Simple Example:

SmarterCSV is designed for robustness — real-world CSV data often has inconsistent formatting, extra whitespace, and varied column separators. Its intelligent defaults automatically clean and normalize data, returning high-quality hashes ready for direct use with ActiveRecord, Sidekiq, or any data pipeline — no post-processing required. See Parsing CSV Files in Ruby with SmarterCSV for more background.

$ cat spec/fixtures/sample.csv
   First Name  , Last	 Name , Emoji , Posts
José ,Corüazón, ❤️, 12
Jürgen, Müller ,😐,3
 Michael, May ,😞, 7

$ irb
>> require 'smarter_csv'
=> true
>> data = SmarterCSV.process('spec/fixtures/sample.csv')
=> [{:first_name=>"José", :last_name=>"Corüazón", :emoji=>"❤️", :posts=>12},
    {:first_name=>"Jürgen", :last_name=>"Müller", :emoji=>"😐", :posts=>3},
    {:first_name=>"Michael", :last_name=>"May", :emoji=>"😞", :posts=>7}]

Notice how SmarterCSV automatically (all defaults):

Normalizes headers → downcase_header: true, strings_as_keys: false
Strips whitespace → strip_whitespace: true
Converts numbers → convert_values_to_numeric: true
Removes empty values → remove_empty_values: true
Preserves Unicode and emoji characters

Batch Processing:

Processing large CSV files in chunks minimizes memory usage and enables powerful workflows:

Database imports — bulk insert records in batches for better performance
Parallel processing — distribute chunks across Sidekiq, Resque, or other background workers
Progress tracking — the optional chunk_index parameter enables progress reporting
Memory efficiency — only one chunk is held in memory at a time, regardless of file size

The block receives a chunk (array of hashes) and an optional chunk_index (0-based sequence number):

# Database bulk import
SmarterCSV.process(filename, chunk_size: 100) do |chunk, chunk_index|
  puts "Processing chunk #{chunk_index}..."
  MyModel.insert_all(chunk)  # chunk is an array of hashes
end

# Parallel processing with Sidekiq
SmarterCSV.process(filename, chunk_size: 100) do |chunk|
  MyWorker.perform_async(chunk)  # each chunk processed in parallel
end

See Examples, Batch Processing, and Configuration Options for more.

Requirements

Minimum Ruby Version: >= 2.6

C Extension: SmarterCSV includes a native C extension for accelerated CSV parsing. The C extension is automatically compiled on MRI Ruby. For JRuby and TruffleRuby, SmarterCSV falls back to a pure Ruby implementation.

Installation

Add this line to your application's Gemfile:

    gem 'smarter_csv'

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install smarter_csv

Documentation

Articles

Parsing CSV Files in Ruby with SmarterCSV
CSV Writing with SmarterCSV
Processing 1.4 Million CSV Records in Ruby, fast
Faster Parsing CSV with Parallel Processing by Jack lin
The original Stackoverflow Question that inspired SmarterCSV
The original post for SmarterCSV

ChangeLog

Reporting Bugs / Feature Requests

Please open an Issue on GitHub if you have feedback, new feature requests, or want to report a bug. Thank you!

For reporting issues, please:

include a small sample CSV file
open a pull-request adding a test that demonstrates the issue
mention your version of SmarterCSV, Ruby, Rails

A Special Thanks to all 59 Contributors! 🎉🎉🎉

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Added some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 361 Commits
.github		.github
docs		docs
ext/smarter_csv		ext/smarter_csv
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.rvmrc		.rvmrc
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTORS.md		CONTRIBUTORS.md
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
TO_DO_v2.md		TO_DO_v2.md
smarter_csv.gemspec		smarter_csv.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SmarterCSV

Performance

Examples

Simple Example:

Batch Processing:

Requirements

Installation

Documentation

Articles

ChangeLog

Reporting Bugs / Feature Requests

A Special Thanks to all 59 Contributors! 🎉🎉🎉

Contributing

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 23

Uh oh!

Languages

Uh oh!

License

tilo/smarter_csv

Folders and files

Latest commit

History

Repository files navigation

SmarterCSV

Performance

Examples

Simple Example:

Batch Processing:

Requirements

Installation

Documentation

Articles

ChangeLog

Reporting Bugs / Feature Requests

A Special Thanks to all 59 Contributors! 🎉🎉🎉

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 23

Uh oh!

Languages

Packages