csv_parser Class Reference

#include <csv_parser.hpp>

List of all members.

Public Member Functions

 csv_parser ()
 ~csv_parser ()
bool init (FILE *input_file_pointer)
bool init (const char *input_filename)
void set_enclosed_char (char fields_enclosed_by, enclosure_type_t enclosure_mode)
void set_field_term_char (char fields_terminated_by)
void set_line_term_char (char lines_terminated_by)
bool has_more_rows (void)
void set_skip_lines (unsigned int lines_to_skip)
csv_row get_row (void)
unsigned int get_record_count (void)
void reset_record_count (void)

Protected Attributes

char enclosed_char
char escaped_char
char field_term_char
char line_term_char
unsigned int enclosed_length
unsigned int escaped_length
unsigned int field_term_length
unsigned int line_term_length
unsigned int ignore_num_lines
unsigned int record_count
FILE * input_fp
char * input_filename
enclosure_type_t enclosure_type
bool more_rows

Private Member Functions

void _skip_lines (void)
void _read_single_line (char **buffer, unsigned int *buffer_len)
void _get_fields_without_enclosure (csv_row_ptr row, const char *line, const unsigned int *line_length)
void _get_fields_with_enclosure (csv_row_ptr row, const char *line, const unsigned int *line_length)
void _get_fields_with_optional_enclosure (csv_row_ptr row, const char *line, const unsigned int *line_length)


Detailed Description

The csv_parser object

Used to parse text files to extract records and fields.

We are making the following assumptions :

The CSV files can be parsed in 3 modes. For option (c) when the enclosure character is optional, if an enclosure character is spotted at either the beginning or the end of the string, it is assumed that the field is enclosed.

The csv_parser::init() method can accept a character array as the path to the CSV file. Since it is overloaded, it can also accept a FILE pointer to a stream that is already open for reading.

The set_enclosed_char() method accepts the field enclosure character as the first parameter and the enclosure mode as the second parameter which controls how the text file is going to be parsed.

See also:
csv_parser::set_enclosed_char()

enclosure_type_t

Todo:
Add ability to parse files where fields/columns are terminated by strings instead of just one char.
Todo:
Add ability to set strings where lines start by. Currently lines do not have any starting char or string.
Todo:
Add ability to set strings where line end by. Currently lines can only end with a single char.
Todo:
Add ability to accept other escape characters besides the backslash character 0x5C.
Todo:
More support for improperly formatted CSV data files.
Author:
Israel Ekpo <israel.ekpo@israelekpo.com>

Constructor & Destructor Documentation

csv_parser::csv_parser (  )  [inline]

Class constructor

This is the default constructor.

All the internal attributes are initialized here

  • The enclosure character is initialized to NULL 0x00.
  • The escape character is initialized to the backslash character 0x5C.
  • The field delimiter character is initialized to a comma 0x2C.
  • The record delimiter character is initialized to a new line character 0x0A.
  • The lengths of all the above-mentioned fields are initialized to 0,1,1 and 1 respectively.
  • The number of records to ignore is set to zero.
  • The more_rows internal attribute is set to false.
  • The pointer to the CSV input file is initialized to NULL
  • The pointer to the buffer for the file name is also initialized to NULL

csv_parser::~csv_parser (  )  [inline]

Class destructor

In the class destructor the file pointer to the input CSV file is closed and the buffer to the input file name is also deallocated.

See also:
csv_parser::input_fp

csv_parser::input_filename


Member Function Documentation

bool csv_parser::init ( FILE *  input_file_pointer  ) 

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Initializes the current object

This init method accepts a pointer to the CSV file that has been opened for reading

It also resets the file pointer to the beginning of the stream

Parameters:
[in] input_file_pointer 
Returns:
bool Returns true on success and false on error.

bool csv_parser::init ( const char *  input_filename  ) 

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Initializes the current object

Parameters:
[in] input_filename 
Returns:
bool Returns true on success and false on error.

void csv_parser::set_enclosed_char ( char  fields_enclosed_by,
enclosure_type_t  enclosure_mode 
)

Defines the Field Enclosure character used in the Text File

Setting this to NULL means that the enclosure character is optional.

If the enclosure is optional, there could be fields that are enclosed, and fields that are not enclosed within the same line/record.

Parameters:
[in] fields_enclosed_by The character used to enclose the fields.
[in] enclosure_mode How the CSV file should be parsed.
Returns:
void

void csv_parser::set_field_term_char ( char  fields_terminated_by  ) 

Defines the Field Delimiter character used in the text file

Parameters:
[in] fields_terminated_by 
Returns:
void

void csv_parser::set_line_term_char ( char  lines_terminated_by  ) 

Defines the Record Terminator character used in the text file

Parameters:
[in] lines_terminated_by 
Returns:
void

bool csv_parser::has_more_rows ( void   )  [inline]

Returns whether there is still more data

This method returns a boolean value indicating whether or not there are still more records to be extracted in the current file being parsed.

Call this method to see if there are more rows to retrieve before invoking csv_parser::get_row()

See also:
csv_parser::get_row()

csv_parser::more_rows

Returns:
bool Returns true if there are still more rows and false if there is not.

void csv_parser::set_skip_lines ( unsigned int  lines_to_skip  )  [inline]

Defines the number of records to discard

The number of records specified will be discarded during the parsing process.

See also:
csv_parser::_skip_lines()

csv_parser::get_row()

csv_parser::has_more_rows()

Parameters:
[in] lines_to_skip How many records should be skipped
Returns:
void

csv_row csv_parser::get_row ( void   ) 

Return the current row from the CSV file

The row is returned as a vector of string objects.

This method should be called only if csv_parser::has_more_rows() is true

See also:
csv_parser::has_more_rows()

csv_parser::get_record_count()

csv_parser::reset_record_count()

csv_parser::more_rows

Returns:
csv_row A vector type containing an array of strings

unsigned int csv_parser::get_record_count ( void   )  [inline]

Returns the number of times the csv_parser::get_row() method has been invoked

See also:
csv_parser::reset_record_count()
Returns:
unsigned int The number of times the csv_parser::get_row() method has been invoked.

void csv_parser::reset_record_count ( void   )  [inline]

Resets the record_count internal attribute to zero

This may be used if the object is reused multiple times.

See also:
csv_parser::record_count

csv_parser::get_record_count()

Returns:
void

void csv_parser::_skip_lines ( void   )  [private]

Ignores N records in the CSV file

Where N is the value of the csv_parser::ignore_num_lines internal property.

The number of lines skipped can be defined by csv_parser::set_skip_lines()

See also:
csv_parser::set_skip_lines()
Returns:
void

void csv_parser::_read_single_line ( char **  buffer,
unsigned int *  buffer_len 
) [private]

Reads a Single Line

Reads a single record into the buffer passed by reference to the method

Parameters:
[in,out] buffer A pointer to a character array for the current line.
[out] buffer_len A pointer to an integer storing the length of the buffer.
Returns:
void

void csv_parser::_get_fields_without_enclosure ( csv_row_ptr  row,
const char *  line,
const unsigned int *  line_length 
) [private]

Extracts the fields without enclosures

This is used when the enclosure character is not set

Parameters:
[out] row The vector of strings
[in] line The character array buffer containing the current record/line
[in] line_length The length of the buffer

void csv_parser::_get_fields_with_enclosure ( csv_row_ptr  row,
const char *  line,
const unsigned int *  line_length 
) [private]

Extracts the fields with enclosures

This is used when the enclosure character is set.

Parameters:
[out] row The vector of strings
[in] line The character array buffer containing the current record/line
[in] line_length The length of the buffer

void csv_parser::_get_fields_with_optional_enclosure ( csv_row_ptr  row,
const char *  line,
const unsigned int *  line_length 
) [private]

Extracts the fields when enclosure is optional

This is used when the enclosure character is optional

Hence, there could be fields that use it, and fields that don't.

Parameters:
[out] row The vector of strings
[in] line The character array buffer containing the current record/line
[in] line_length The length of the buffer


Member Data Documentation

The enclosure character

If present or used for a field it is assumed that both ends of the fields are wrapped.

This is that single character used in the document to wrap the fields.

See also:
csv_parser::_get_fields_without_enclosure()

csv_parser::_get_fields_with_enclosure()

csv_parser::_get_fields_with_optional_enclosure()

The escape character

For now the only valid escape character allowed is the backslash character 0x5C

This is only important when the enclosure character is required or optional.

This is the backslash character used to escape enclosure characters found within the fields.

See also:
csv_parser::_get_fields_with_enclosure()

csv_parser::_get_fields_with_optional_enclosure()

Todo:
Update the code to accept other escape characters besides the backslash

The field terminator

This is the single character used to mark the end of a column in the text file.

Common characters used include the comma, tab, and semi-colons.

This is the single character used to separate fields within a record.

The record terminator

This is the single character used to mark the end of a record in the text file.

The most popular one is the new line character however it is possible to use others as well.

This is the single character used to mark the end of a record

See also:
csv_parser::get_row()

Enclosure length

This is the length of the enclosure character

See also:
csv_parser::csv_parser()

csv_parser::set_enclosed_char()

The length of the escape character

Right now this is really not being used.

It may be used in future versions of the object.

Todo:
Update the code to accept other escape characters besides the backslash

Length of the field terminator

For now this is not being used. It will be used in future versions of the object.

Length of the record terminator

For now this is not being used. It will be used in future versions of the object.

Number of records to discard

This variable controls how many records in the file are skipped before parsing begins.

See also:
csv_parser::_skip_lines()

csv_parser::set_skip_lines()

Number of times the get_row() method has been called

See also:
csv_parser::get_row()

csv_parser::input_fp [protected]

The CSV File Pointer

This is the pointer to the CSV file

Buffer to input file name

This buffer is used to store the name of the file that is being parsed

Mode in which the CSV file will be parsed

The various values are explained below

  • ENCLOSURE_NONE (1) means the CSV file does not use any enclosure characters for the fields
  • ENCLOSURE_REQUIRED (2) means the CSV file requires enclosure characters for all the fields
  • ENCLOSURE_OPTIONAL (3) means the use of enclosure characters for the fields is optional
See also:
csv_parser::get_row()

csv_parser::_read_single_line()

csv_parser::_get_fields_without_enclosure()

csv_parser::_get_fields_with_enclosure()

csv_parser::_get_fields_with_optional_enclosure()

csv_parser::more_rows [protected]

There are still more records to parse

This boolean property is an internal indicator of whether there are still records in the file to be parsed.

See also:
csv_parser::has_more_rows()


The documentation for this class was generated from the following files:

Generated on Sun Jun 28 21:19:30 2009 for CSV Parser by  doxygen 1.5.5