What began as an apparently simple task ended up as a trip down the rabbit hole.
Read on, or just download here.
1. What is a vCard, and Why do I need to Convert it?
vCard is a standard format for exchanging “directory” information, such as name, address, phone numbers, and e-mail addresses of various contacts.
Contacts are people and businesses who you know and communicate with:
- Associates from work
- People and companies with whom you do business
So a contact could be your Mom’s cell phone, your boss’s cell and e-mail, the number to your favorite Chinese restaurant, that website where you buy widgets online, etc…
1.1. Context is Key
Aside from simply recording your boss’s phone number, it’s helpful to know that it is a phone number rather than an email address, and that this particular phone number is her cell phone rather than her work phone.
For example, if I open one of my contacts in my phone’s address book, and click on the phone number, it should open the phone application rather than the e-mail application.
The phone knows to do this because the phone number is stored in a field whose label is “phone number” rather than “email” field. Further, I know that this is her cell because there are several phone number fields, and the one for this particular phone number is labeled “cell phone number”. When I click on it, my phone application displays a very helpful message: “Dialing Diane’s cell phone”.
vCard provides a universal method for different systems to exchange this information in a meaningful way – for example, we don’t want to store a phone number as an e-mail address, and it’s helpful to store complex fields such as names and physical addresses in a canonical format (sub parts), rather than as a simple text field.
1.2. Virtual Contact Format (vCard)
If you send someone a “contact” or send someone your virtual business card, the chances are fairly good that it’s sent in vCard format.
Additionally, most cell phones, address books, personal information managers (PIMs) and e-mail programs allow you to export or import your contacts, and usually this is done by saving them as a “VCF” (Virtual Contact File).
As an example, when you transfer your contacts from your old phone to your new phone, or import your e-mail contacts in to your phone, it writes a VCF, then stores that VCF on your new phone, and then the new phone imports each contact stored within the VCF.
1.3. Defining The Problem
vCard works well, when simply swapping contact information between systems, but what if I want to use this data in another way?
- Send a form letter to all of my contacts using LibreOffice Mail Merge
- Send all of my friends a virtual Christmas card
- Analyze sales volume by region based on my business contacts
Although a spreadsheet or database probably makes the most sense, only a few programs allow you to import / export your contacts that way.
Lacking an “export to spreadsheet” capability, the next best approach would be to export a VCF, and then convert the VCF itself in to a spreadsheet-compatible format.
1.3.1 vCard is List-Oriented, NOT Table-Oriented
vCard is list-oriented. What does that mean?
vCard defines an entity (vCard), then associates a list of attributes to that entity.
BEGIN:VCARD VERSION:2.1 N:Timerson;Tim;;Mr;; FN:Mr. Tim Timerson TEL;CELL:8234567190 TITLE:CEO & Owner, Timerson, Inc END:VCARD BEGIN:VCARD VERSION:2.1 N:Williams;Bill;A.;; FN:Bill A. Williams EMAIL;HOME:firstname.lastname@example.org ADR;HOME:;123 Anystreet;Denver;CO;12345;; END:VCARD
In the example above, we have two contacts stored in a single VCF:
- Tim has a phone number and a job title
- Bill has an email address and physical address
Although every contact has a few attributes in common, each contact can have a completely different set of attributes. This is known as a variable schema.
Spreadsheets have a fixed schema, which means that our final product, in tabular format, must have columns for every possible attribute, whether an individual contact uses them or not.
1.3.2 Why Not Use a Predefined Schema?
If I use a program such as Thunderbird that supports VCF import and csv export, I could benefit from Thunderbird’s predefined schema.
Here are the reasons I chose NOT to take that approach:
- When dealing with multiple platforms, there could easily be a situation where there is no clear mapping between specific fields. Unless you go through an expensive field-mapping exercise, there is a risk of lost data, or loss of context for unmapped fields.
- Over time, one tends to migrate from platform to platform. Every time you transfer your contacts, the new platform has to guess about how to match the fields from the old platform, and there is a tendency for field names and usage to change over time based on current technology. For example, what would you do with a “MySpace” field? Although most people are probably on Facebook today, some of your contacts might still be using MySpace, and you certainly don’t want to lose that data, or risk losing the context that “this URL is my contact’s MySpace page” rather than “some random URL”. You might read this and chuckle, but what if Twitter gets replaced by something new in the next 10 years?
- Through successive platform migrations, there tends to be duplication. How many of us have mangled our contacts by importing them twice? You end up with duplicate contacts, or one contact with two copies of the same phone number, or phone numbers that appear in some random field. Although addressing duplicate contacts is NOT a part of the scope of this exercise, addressing duplicate attributes definitely is.
- Each platform has vendor-specific attributes. In an attempt to be helpful, one of your previous platforms may have added one or more “X-SOMEFIELD” to your contact attributes. Although the field itself is vendor-specific, it should be up to you (the data owner) to see what’s in there, and decide how to use it.
- Users tend not to follow the rules. You can put any kind of text in to any text field, which means that, perhaps your end user is storing notes in the “TITLE” (Job Title) field, because it’s easier to access? Maybe the end user is disorganized, and there is random data scattered all over the place? In addition, customizable fields tend to be a dumping ground for unstructured data.
Rather than use a predefined schema, I’d rather build a process that simply “dumps out” everything that was exported in to the VCF, and then let the user (data owner) decide what to do with it.
2. A Word on File Formats
As we’ve discussed, vCard uses a variable schema, where every entity has some attributes in common, but each entity can use a different set of attributes.
Ultimately, we want to convert to a static schema, but perhaps a variable-schema format might be a good intermediate step.
There are two file formats that specifically support a variable schema.
2.1. eXtensible Markup Language (XML)
XML (eXtensible Markup Language) provides a way to use HTML-like syntax to define variable and hierarchical data structures.
XML defines an entity, and then assigns attributes to that entity.
- Closely matches vCard syntax
- In common use, even as an end-user format
- Easily-portable, text encoding
XML also has its challenges:
- vCard uses semi-colon ( ; ) as a sub-field delimiter. XML does not allow semi-colons within an attribute name, so vCard’s “TEL;WORK” needs to be transformed in to something like “TEL-WORK”
- XML enforces HTML character encoding. This means that “Bob & Tim’s Bicycle Repair” would need to be converted to “Bob & Tim's Bicycle Repair”. When we eventually convert this in to a spreadsheet, we might need to “unescape” the data fields in order to make them usable.
- In an end-user context, an XML document requires a schema definition. Our process must either build a schema, or rely on automation to build one.
Like XML, JSON is designed to encapsulate data with a variable structure.
JSON’s “object” notation defines an object, which is a collection of name-value pairs, each of which could be a base type or other another object.
- Functionally-equivalent to XML’s entity-attribute model
- Like XML, JSON is text-encoded.
- As of this writing, JSON is mostly used on the back-end by developers, and is not necessarily within the end-user mainstream.
I’ll qualify the above statement: If you ask a typical power user to export a spreadsheet to XML, they know how to do this, what XML is, and that you probably intend to use the XML for a non-spreadsheet application. If you ask the same person to export a spreadsheet to a JSON file, they probably wouldn’t know how to do it, they probably don’t know what JSON is, nor what it’s used for, and their spreadsheet application probably doesn’t support JSON.
- Unlike XML, very few end-user applications natively support it.
- Unlike XML, JSON requires a few, very simple encoding rules.
- Although JSON doesn’t require a schema, at the very least, the sending and receiving applications must have some common knowledge of how to build the JSON and how to read it, respectively.
2.3. Comma-Separated Value (CSV)
Although CSV does NOT support a variable schema, it’s the most popular tabular format used by every spreadsheet program on the planet.
- Uses text encoding
- Used almost universally by many end-user applications
- Fixed header row defines field names
- Each entity is one row (record)
- Unused fields are left empty (NULL)
One of the caveats of using CSV is that a data field with an embedded comma can cause parsing problems. Let’s suppose that you do business with a lawfirm called “Mike, Mike & Ike”, the embedded comma might cause problems.
Example CSV file:
NAME,PHONE Bob,123-456-7980 Mike, Mike & Ike Lawfirm,800-234-5678 Steve,987-654-3201
|Mike||Mike & Ike Lawfirm||800-234-5678|
In the example above, the embedded comma within a data field causes the phone number to be shifted to the right.
There are three ways to fix this.
Use quotes as a text delimiter.
Most program allows you to include illegal characters (in our case, the comma) as part of a field value, as long as that value is enclosed in quotes ( ” )
NAME,PHONE "Bob",123-456-7980 "Mike, Mike & Ike Lawfirm",800-234-5678 "Steve",987-654-3201
If parsed correctly, the above CSV will interpret “Mike, Mike & Ike Lawfirm” as a single data element, because it’s enclosed in quotes.
Use a different delimiter.
Another common way to protect against embedded delimiters is to use the TAB character or another special character as a delimiter. Most applications understand data that is tab-separated, but the resulting file is less universally-compatible.
NAME[TAB]PHONE Bob[TAB]123-456-7980 Mike, Mike & Ike Lawfirm[TAB]800-234-5678 Steve[TAB]987-654-3201
Although most spreadsheet applications will interpret this correctly, older systems and applications must be specifically configured to look for the TAB as a delimiter.
Remove commas from data values.
This is the most computationally-expensive option: Simply change any instance of a comma within a data value to some other character, such as an underscore ( _ ).
NAME,PHONE Bob,123-456-7980 Mike_ Mike & Ike Lawfirm,800-234-5678 Steve,987-654-3201
The two reasons that I don’t like this option:
- Alters the user’s (data owner’s) data
3. Online Conversion is NOT an Option
Some of you may be reading this and thinking: “Why not convert the file online?”
After all, there is an online file converter for just about every file format in existence.
3.1. Contacts are Personal Data
Contact information is personal data. Not YOUR personal data, but the personal data of each of your respective contacts. If YOU have MY cell phone number, I don’t want you to upload it to some third-party website.
In addition, there may be laws against you sharing or uploading my contact info.
- Most security and privacy laws define Personally-Identifiable Information (PII) or Sensitive Personal Information (SPI) to include your contact information.
- When used in conjunction with financial data, the use of personal data falls under the Graham-Leach-Bliley Act (GLBA), as Non-Public Personal Information (NPPI)
- When used in conjunction with medical information, personal data falls under the Health Insurance Portability and Accountability Act (HIPAA)
- When used in conjunction with a credit card, personal data falls under both Payment Card Industry (PCI) requirements as well as the Fair Credit Reporting Act (FCRA).
3.2. No Control
- Many websites explicitly state that intellectual property rights (copyright and other rights) transfers to them.
- Some websites proudly declare “WE WILL NEVER SHARE OR SELL YOUR INFORMATION TO A THIRD-PARTY!” (until they go bankrupt, and the acquiring entity decides otherwise).
As a professional, your worst-case scenario is for all of your contacts to get an e-mail the next day, stating “Justin Parr signed you up for this service!”
3.3. Free Service Limitations
Often, online conversion services offer “first 10 contacts for free, sign up for $20/month for unlimited”, or similar.
Pay for play doesn’t work.
3.4. No Batch Capability
Using an online service, I can convert one file at a time.
What if I need to convert several files at a time?
Let’s say I have a utility called “vCardToCSV”…
I can run a one-line script:
for %i in (*.vcf) do vcardtocsv %i
4. First Attempt: Convert to XML
My first thought, due to similar structure, was to convert the VCF to XML.
I wrote a simple script to read the VCF line by line, and do the following:
- Write a root entity, <Contacts> in the output XML file
- Change “BEGIN:VCARD” to <VCARD>
- Change any attribute line:
- Parse at the colon ( : ) to obtain the ATTR and VALUE as separate variables
- Transform to: <ATTR><TEXT>VALUE</TEXT></ATTR>
- Change “END:VCARD” to </VCARD>
- Write a closing tag for the root entity: </Contacts>
Using our example above, the resulting XML looks like this (indentation added for readability):
<Contacts> <VCARD> <VERSION><TEXT>2.1</TEXT></VERSION> <N><TEXT>Timerson;Tim;;Mr;;</TEXT></N> <FN><TEXT>Mr. Tim Timerson</TEXT></FN> <TEL;CELL><TEXT>8234567190</TEXT></TEL;CELL> <TITLE><TEXT>CEO & Owner, Timerson, Inc</TEXT></TITLE> </VCARD> <VCARD> <VERSION><TEXT>2.1</TEXT></VERSION> <N><TEXT>Williams;Bill;A.;;</TEXT></N> <FN><TEXT>Bill A. Williams</TEXT></FN> <EMAIL;HOME><TEXT>email@example.com</TEXT></EMAIL;HOME> <ADR;HOME><TEXT>;123 Anystreet;Denver;CO;12345;;</TEXT></ADR;HOME> </VCARD> </Contacts>
So, right away, we have a few problems:
- An attribute tag can’t contain forbidden characters, such as semicolon ( ; ), such as “TEL;CELL”. This needs to be converted in to an allowed character, such as a dash ( – ): “TEL-CELL”
- Data values must be converted to HTML encoding. “CEO & Owner” must be converted to “CEO & Owner”
Even though I’ve used this approach in the past, I’ve never had to contend with these two issues before.
In addition, the sample data file had multiple copies of attributes for some entries, for example:
BEGIN:VCARD VERSION:2.1 N:Boberson;Bob;;; FN:Bob Boberson TEL;CELL:123-456-7890 TEL;CELL:123-456-7890 END:VCARD
Despite addressing the special characters and HTML encoding, I still had a file with duplicate attributes.
- Despite explicit <TEXT> declarations, LibreOffice refused to touch it without a schema
- Excel built a schema, but because of the duplicate entries, the schema was useless
If I have to waste time building a schema or removing duplicate attributes, I might as well write my own utility.
5. Developing vCardToCSV
Setting aside all other options, I elected to write a small utility to perform this function.
As we’ve seen, regardless of what option we choose, ultimately, we need a schema.
As CSV begins with the header row, and knowing that we must have a name for every column, this is especially true of CSV format.
The best approach is to make two passes through the file, where the first pass collects attribute names, and the second pass parses data values based on those attribute names.
- Make a pass through the entire input file, collecting unique attribute (column) names
- Write the CSV header row
- Make a second pass through the entire input file
- Initialize buffers
- Look for BEGIN:VCARD
- Read a line
- If the line is END:VCARD, go to 3.5
- If the line contains “:”
- Separate in to a key-value pair
- Find the key index within the column array
- Either add the new value to the value buffer for that column, or discard if duplicate
- Goto 3.3
- Write one row to the CSV file. Write the contents of each column’s value buffer, separated by comma.
5.2. Special Cases
Some fields need to be specially-formatted.
5.2.1 Telephone Numbers
vCard stores a phone number as a string of digits without formatting.
It’s more helpful to display a phone number in 10 or 11-digit format:
- 10 digits: nnn-nnn-nnnn
- 11 digits: n-nnn-nnn-nnnn
Addresses in vCard are stored in a semi-colon-delimited string:
;123 Anywhere St;;Sometown;Ohio;12345;;
It’s more convenient to view an address in a more natural format, but you can’t embed carriage returns in a CSV.
The approach I took is to parse out each field, then concatenate non-empty fields using a c-like line terminator: “\n”
123 Anywhere St\nSometown\nOhio\n12345
Later, we can convert this to a multiline cell with a formula similar to the following:
Although this doesn’t produce a properly-formatted address, the result is natural enough to be useful:
123 Anywhere St Sometown Ohio 12345
I leave it to the reader and some clever spreadsheet manipulation to reformat the address as needed.
6. Download and Use vCardToCSV
By downloading or using vCardToCSV, you agree to all of the Terms and Conditions.
Current version: 1.2
vCardToCSV is a command-line utility.
- Save your .vcf to a specific folder that you know you can find later – you will need the exact location in order to access it from the command prompt later.
- Download vCardToCSV and save it to the same folder where the .vcf is located. vCardToCSV is packaged as a zip file.
- Extract vCardToCSV.exe from the zip file, and save it to the same folder.
- Double-click the zip file
- Right-click vCardToCSV.exe, select “copy” or “copy to”
- If prompted for a location, save vCardToCSV.exe to the same folder where the .vcf is located
- If NOT prompted, either drag and drop the file, or go back to the .vcf folder location and select “paste”
- If done correctly, vCardToCSV.exe and vCardToCSV_N_N.zip should both be located in the same folder where the .vcf is located.
- Open a command prompt.
- Change Directory to the folder location.
cd /d "c:\somefolder"
- Run the conversion tool
vcardtocsv -i file.vcf
- When finished, the tool will output a line indicating the total number of contacts:
nnn total written.
- Change Directory to the folder location.
- Verify that the csv file was created – it should have the same name as the vcf, but with .csv at the end:
- Double-click the CSV file, and it should open in your favorite spreadsheet program.
Please contact me or post a comment, and I’ll do my best to help you.
7. To Do List
- Sort / group columns. For example, make sure that all TEL; columns are adjacent.
- Option to merge columns by type. For example, merge all phone numbers in to a single field.
- Specify custom field and line delimiters.
8. Terms and Conditions
I am publishing vCardToCSV with the best intentions – for people to use as a convenience.
However, I also have to protect myself from the assholes of the world, so here are the terms and conditions:
- By downloading or using vCardToCSV software (this tool), you agree to be bound by all of these terms and conditions.
- This tool is provided by the author (Justin Parr, justinparrtech.com) as-is, with no warranty expressed or implied, and no guaranty of fitness of purpose.
- Support is provided by the author as he sees fit. You are not entitled to a response, nor contact of any kind, nor bug fixes, nor updates.
- By using this tool you accept all liability, and you agree to hold the author and hosting facility harmless and free of all liability of any kind. In addition, you accept full responsibility for your data, and your equipment, and your job, and your third-party obligations.
- This tool might act on your data incompletely, or in a way that you didn’t expect, or might not function as intended, or might interact with other software in an unexpected manner, and you accept all of this to be YOUR PROBLEM.
- Always make a backup copy of your data and scan for viruses and other malware before using any third-party software, including this tool.
- This tool is a stand-alone binary executable (software) that runs on your equipment. No data is collected nor transmitted to any other location or website.
- You are granted a limited license to use this tool only for your own personal use. Personal use of this tool is royalty-free. The author reserves the right to revoke your license to use the tool at any time for any reason. If your license is revoked, you are required to delete all copies of the tool in your possession, in any format, and certify to the author that you no longer possess any copies of the tool.
- The author reserves all rights to this tool, including any that are not explicitly enumerated within these Terms and Conditions. However:
- You may NOT sell or otherwise profit from this tool.
- You may NOT reverse-compile this tool (derive source code from the binary executable).
- You may NOT host this tool on any third-party website.
- You may NOT use this tool in conjunction with, nor as part of any service provided to any third party.
- The author may assume, exercise, grant, or revoke any right, at any time, for any reason.
- You retain ownership of your data.
- These Terms and Conditions shall remain in effect, in perpetuity, unless amended or agreed by the author, and with the author’s explicit written consent.
- These Terms and Conditions may be updated from time to time.
- The most recent version of these Terms and Conditions shall be located at: http://justinparrtech.com/JustinParr-Tech/?p=4541#Terms_and_Conditions
- The most recent version of these Terms and Conditions shall supersede any previous version, and shall be considered binding regardless of when the tool was last downloaded or used.