#include <csv_parser.hpp>
Public Member Functions | |
csv_parser () | |
~csv_parser () | |
bool | init (FILE *input_file_pointer) |
bool | init (const char *input_filename) |
void | set_enclosed_char (char fields_enclosed_by, enclosure_type_t enclosure_mode) |
void | set_field_term_char (char fields_terminated_by) |
void | set_line_term_char (char lines_terminated_by) |
bool | has_more_rows (void) |
void | set_skip_lines (unsigned int lines_to_skip) |
csv_row | get_row (void) |
unsigned int | get_record_count (void) |
void | reset_record_count (void) |
Protected Attributes | |
char | enclosed_char |
char | escaped_char |
char | field_term_char |
char | line_term_char |
unsigned int | enclosed_length |
unsigned int | escaped_length |
unsigned int | field_term_length |
unsigned int | line_term_length |
unsigned int | ignore_num_lines |
unsigned int | record_count |
FILE * | input_fp |
char * | input_filename |
enclosure_type_t | enclosure_type |
bool | more_rows |
Private Member Functions | |
void | _skip_lines (void) |
void | _read_single_line (char **buffer, unsigned int *buffer_len) |
void | _get_fields_without_enclosure (csv_row_ptr row, const char *line, const unsigned int *line_length) |
void | _get_fields_with_enclosure (csv_row_ptr row, const char *line, const unsigned int *line_length) |
void | _get_fields_with_optional_enclosure (csv_row_ptr row, const char *line, const unsigned int *line_length) |
Used to parse text files to extract records and fields.
We are making the following assumptions :
The csv_parser::init() method can accept a character array as the path to the CSV file. Since it is overloaded, it can also accept a FILE pointer to a stream that is already open for reading.
The set_enclosed_char() method accepts the field enclosure character as the first parameter and the enclosure mode as the second parameter which controls how the text file is going to be parsed.
csv_parser::csv_parser | ( | ) | [inline] |
Class constructor
This is the default constructor.
All the internal attributes are initialized here
csv_parser::~csv_parser | ( | ) | [inline] |
Class destructor
In the class destructor the file pointer to the input CSV file is closed and the buffer to the input file name is also deallocated.
bool csv_parser::init | ( | FILE * | input_file_pointer | ) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Initializes the current object
This init method accepts a pointer to the CSV file that has been opened for reading
It also resets the file pointer to the beginning of the stream
[in] | input_file_pointer |
bool csv_parser::init | ( | const char * | input_filename | ) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Initializes the current object
[in] | input_filename |
void csv_parser::set_enclosed_char | ( | char | fields_enclosed_by, | |
enclosure_type_t | enclosure_mode | |||
) |
Defines the Field Enclosure character used in the Text File
Setting this to NULL means that the enclosure character is optional.
If the enclosure is optional, there could be fields that are enclosed, and fields that are not enclosed within the same line/record.
[in] | fields_enclosed_by | The character used to enclose the fields. |
[in] | enclosure_mode | How the CSV file should be parsed. |
void csv_parser::set_field_term_char | ( | char | fields_terminated_by | ) |
Defines the Field Delimiter character used in the text file
[in] | fields_terminated_by |
void csv_parser::set_line_term_char | ( | char | lines_terminated_by | ) |
Defines the Record Terminator character used in the text file
[in] | lines_terminated_by |
bool csv_parser::has_more_rows | ( | void | ) | [inline] |
Returns whether there is still more data
This method returns a boolean value indicating whether or not there are still more records to be extracted in the current file being parsed.
Call this method to see if there are more rows to retrieve before invoking csv_parser::get_row()
void csv_parser::set_skip_lines | ( | unsigned int | lines_to_skip | ) | [inline] |
Defines the number of records to discard
The number of records specified will be discarded during the parsing process.
[in] | lines_to_skip | How many records should be skipped |
csv_row csv_parser::get_row | ( | void | ) |
Return the current row from the CSV file
The row is returned as a vector of string objects.
This method should be called only if csv_parser::has_more_rows() is true
csv_parser::get_record_count()
unsigned int csv_parser::get_record_count | ( | void | ) | [inline] |
Returns the number of times the csv_parser::get_row() method has been invoked
void csv_parser::reset_record_count | ( | void | ) | [inline] |
Resets the record_count internal attribute to zero
This may be used if the object is reused multiple times.
void csv_parser::_skip_lines | ( | void | ) | [private] |
Ignores N records in the CSV file
Where N is the value of the csv_parser::ignore_num_lines internal property.
The number of lines skipped can be defined by csv_parser::set_skip_lines()
void csv_parser::_read_single_line | ( | char ** | buffer, | |
unsigned int * | buffer_len | |||
) | [private] |
Reads a Single Line
Reads a single record into the buffer passed by reference to the method
[in,out] | buffer | A pointer to a character array for the current line. |
[out] | buffer_len | A pointer to an integer storing the length of the buffer. |
void csv_parser::_get_fields_without_enclosure | ( | csv_row_ptr | row, | |
const char * | line, | |||
const unsigned int * | line_length | |||
) | [private] |
Extracts the fields without enclosures
This is used when the enclosure character is not set
[out] | row | The vector of strings |
[in] | line | The character array buffer containing the current record/line |
[in] | line_length | The length of the buffer |
void csv_parser::_get_fields_with_enclosure | ( | csv_row_ptr | row, | |
const char * | line, | |||
const unsigned int * | line_length | |||
) | [private] |
Extracts the fields with enclosures
This is used when the enclosure character is set.
[out] | row | The vector of strings |
[in] | line | The character array buffer containing the current record/line |
[in] | line_length | The length of the buffer |
void csv_parser::_get_fields_with_optional_enclosure | ( | csv_row_ptr | row, | |
const char * | line, | |||
const unsigned int * | line_length | |||
) | [private] |
Extracts the fields when enclosure is optional
This is used when the enclosure character is optional
Hence, there could be fields that use it, and fields that don't.
[out] | row | The vector of strings |
[in] | line | The character array buffer containing the current record/line |
[in] | line_length | The length of the buffer |
csv_parser::enclosed_char [protected] |
The enclosure character
If present or used for a field it is assumed that both ends of the fields are wrapped.
This is that single character used in the document to wrap the fields.
csv_parser::escaped_char [protected] |
The escape character
For now the only valid escape character allowed is the backslash character 0x5C
This is only important when the enclosure character is required or optional.
This is the backslash character used to escape enclosure characters found within the fields.
csv_parser::field_term_char [protected] |
The field terminator
This is the single character used to mark the end of a column in the text file.
Common characters used include the comma, tab, and semi-colons.
This is the single character used to separate fields within a record.
csv_parser::line_term_char [protected] |
The record terminator
This is the single character used to mark the end of a record in the text file.
The most popular one is the new line character however it is possible to use others as well.
This is the single character used to mark the end of a record
csv_parser::enclosed_length [protected] |
Enclosure length
This is the length of the enclosure character
csv_parser::escaped_length [protected] |
The length of the escape character
Right now this is really not being used.
It may be used in future versions of the object.
csv_parser::field_term_length [protected] |
Length of the field terminator
For now this is not being used. It will be used in future versions of the object.
csv_parser::line_term_length [protected] |
Length of the record terminator
For now this is not being used. It will be used in future versions of the object.
csv_parser::ignore_num_lines [protected] |
Number of records to discard
This variable controls how many records in the file are skipped before parsing begins.
csv_parser::record_count [protected] |
csv_parser::input_fp [protected] |
The CSV File Pointer
This is the pointer to the CSV file
csv_parser::input_filename [protected] |
Buffer to input file name
This buffer is used to store the name of the file that is being parsed
csv_parser::enclosure_type [protected] |
Mode in which the CSV file will be parsed
The various values are explained below
csv_parser::more_rows [protected] |
There are still more records to parse
This boolean property is an internal indicator of whether there are still records in the file to be parsed.