Home > Genealogy > Intermediate Genealogy > GEDCOMs > Convert

How do I Convert a Text File to a GEDCOM?




Intermediate Genealogy:

Who to include

Scams and Tricks

GEDCOM Files:
What it is
How you read one
Converting text to GEDCOM

Standards

Systematic Research

Publishing your data on the Web



Related Sections:
Main Genealogy Page
Beginning Genealogy
Essays on Genealogy
Biography Questions
Cousins
Why do it at all?


Other Sections:
Home
Christmas Letters
Essays
Homilies
Peace Corps
Web Design

   

The question of how to convert text to GEDCOM comes up now and again on the genealogy bulletin boards. Someone has an Excel spreadsheet, a text file or a Word document they want to convert to a GEDCOM. It may be an "Ancestors of" report from an e-correspondent or a carefully preserved typescript from a family member who has passed on. (I've always imagined heaven, for us genealogists, to be the place where you can finally finish your research: "Hi there, Ebeneezer! I'm your 4th great grandson! Now tell me - who were your parents?")

Back here on earth, the people with the file want to be able to put the data into their genealogy program's database without typing it all over again. The easiest way, if the person who sent you the text is still alive and still speaking to you, is to ask them to send you the data again, only this time in GEDCOM format. If that is not an option, it is physically possible to transform a file into GEDCOM, but it is a lot of work and it requires a fair amount of skill with a word processing program or Excel. Even then, it is a lot of work, because you'll have to link everyone up afterwards.

Basically, you have to transform

John Wesley McCorkle was born 01 April 1852 in Pocatello, Idaho.
into
0 @003@ INDI
1 NAME John Wesley/McCorkle/
2 GIVN John Wesley
2 SURN McCorkle
1 BIRT
2 DATE 01 Apr 1852
2 PLAC Pocatello, Idaho

Your best results come if the original file uses a prose style that is obsessively consistent and duller than dirt. If, for example, John's sister "first saw the light of day on 01 September 1854" and his brother "gave joy to his parents on 01 November 1856", you have more substituting to do than if they were both "born on", as John was. Spreadsheets are a little easier in some ways, but they have their own problems.

Before we start the step-by-step, you should know Word has a trick or two up its sleeve. When you click on
Edit -> Replace,
you can use special characters. ^l (circumflex, lower-case L) means a manual line break (a new line). ^p (circumflex, lower-case P) is a paragraph break. ^t (circumflex, lower-case T) is the tab character. You can take special characters out by replacing them with null or a space. You can put special characters in by changing a given character string to the same one with the additional special character. If the text editor you are using can't deal with tabs and line breaks, you should stop reading now, because you won't be able to do this.

Step one is to create a sample. Start your genealogy program and add someone false. I use Arnold Aardvark. Give him one of each fact present in your text file; born, died, buried, died of, military service, graduated from, will probated on (date), and so on. Give him a two-paragraph note, too. That comes up later. Finally, export him as a one- person GEDCOM, open the GEDCOM with Notepad and print it off. That will show you what you are shooting for.

Don't give Arnold a spouse, a child or a parent. It is easier to change the text file into GEDCOM format, import everyone on it as independent individuals, then use the "Add Spouse", "Add Child" and "Add Parents" functions to link everyone to each other than to try to add the family lines to your GEDCOM file.

The next part assumes you have a text file or word document. Excel spreadsheets come after this. I'm putting the replacement strings in courier font for clarity, but the digit "1" and the lower-case "L" look A LOT alike. Be careful. Assuming there is a new paragraph for each person, your first step would be to replace ^p (the paragraph break) with:

^l0 @000#@ INDI^l 1 NAME

(line break, digit 0, space, key word @000#@, key word INDI, another line break, digit 1, key word NAME.)

That changes
John Wesley McCorkle was born 01 April 1852 in Pocatello, Idaho.
into

0 @00#@ INDI
1 NAME John Wesley McCorkle was born 01 April 1852 in Pocatello, Idaho.

Now replace " was born" with

/^l 1 BIRT ^l 2 DATE

(Slash, line break, digit 1, key word BIRT, line break, digit 2, key word DATE. Note the space in front of the "was", too.)

That changes

0 @00#@ INDI
1 NAME John Wesley McCorkle was born 01 April 1852 in Pocatello, Idaho.


into

0 @00#@ INDI
1 NAME John Wesley McCorkle/
1 BIRT
2 DATE April 1852 in Pocatello, Idaho.

As you can see, we're getting somewhere. We still have to replace the word " in ", in front of Pocatello, with "2 PLAC" to get the birthplace fact straight. We have to cope with the other facts. We need to add two level 2 records to the NAME record - GIVN and SURN. We need another slash in the 1 level NAME record, to make "Wesley/" into /Wesley/. We have to change all of the other facts about John into GEDCOM records, using the Aardvark GEDCOM we made up above as a guide. Any narrative that doesn't have fit in a specific GEDCOM record goes in a NOTE. I treat notes below.

When we finish, we have to go back and change the "#" in the level 0 INDI record to sequential numbers. It is a long process. If you are handy with Word macros, you can automate the process somewhat. If you have a copy of EDT run under VMS, you are in hog heaven, because it is easy. EDT is the finest text editor known to man, in my opinion.

Excel

Excel lets you add columns of data, which helps. Some versions of it have problems saving spreadsheets as text files when lines in the text file are over 255 or 511 characters, which hinders. I'm going to go through the name part of converting an Excel spreadsheet to a GEDCOM file and leave the other facts as an exercise to the reader.

Let's say you have a spreadsheet:

A
John Wesley McCorkle

You want it to look like this:

A B C D E F G H I J K L M N
0 @ 001 @ INDI *L* 1 NAME John Wesley / McCorkle / *L* 2 GIVN John Wesley *L* 2 SURN McCorkle

Excel has some limited text manipulating abilities. For example, if a cell has text in it, you can find out where the last space is. This should be the space before the surname, no matter how many given names your individual has. Once you find out where the space is, you can tell Excel to copy the text from that last space to the end of the cell into a second column, thus isolating the surname. If any of your individuals have a two-word surname ("St. John"), change it to one word with an underscore. ("St._John") which you can replace later.

Use the Excel functions to insert new columns in your spreadsheet, filling them with selected text from the original cells or with literals. Cell B1 in the spreadsheet above has 001 in it. You'd want to make cell B2 (B1+1) and so on, all the way down your rows. That would give your individuals unique numbers. Those columns above with the *L* in them are targets for Word, which come up in the next step.

After you get the columns arranged, save the Excel spreadsheet as a text file. Then open the text file with Word, replace the "*L*" markers with ^l (line breaks). When you do, what was columns D through H above will be one line,

1 NAME John Wesley /McCorkle/

You may have to replace a slash followed by a space with a slash alone, or do other clean-up chores. If you saved the spreadsheet as a space-delimited text file, replace every instance of two spaces in a row to one space. Repeat as needed. If you saved the spreadsheet as a tab-delimited text file, replace all of the tab characters (^t) with null. Save the file again.

Headers and Notes

Each GEDCOM has a header and a trailer. They are not particularly important, except you won't be able to import your GEDCOM without them. Go back to your Aardvark GEDCOM and copy the header. Paste it into your GEDCOM at the top.

Here is a sample:

0 head
1 SOUR FamilyOrigins
2 NAME Family Origins(R) for Windows
2 VERS 8.0
2 CORP FormalSoft, Inc.
1 DEST DISKETTE
1 DATE 30 AUG 2003
1 FILE a_test.ged
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 CHAR ANSI

As you can guess, I used FOW version 8.0 to make a test GEDCOM. You could put "Acme Software Wizards" in the corporation record, if you were feeling puckish, without affecting anything. You can take out the SOUR, DEST, DATE, and FILE entries if you like, but it doesn't hurt to leave them in, and it is simpler.

The trailer is just one record; it tells your importing program that is has come to the end:

0 TRLR

Paste it into the bottom of your GEDCOM

NOTE records should be 80 characters or less, but you can continue the note with either CONT or CONC records. CONT continues the note on a new line, CONC concatenates the note data with the previous line. Here is an example:

1 NOTE Ted climbed Mount Whitney when he was 18. He went to UC Berkeley
2 CONC during the late 60's, then served as a Peace Corps Volunteer in
2 CONC Sarawak, Borneo, where he was almost tattooed by headhunters.
2 CONT
2 CONT The rest of his life was rather dull. In later years he took up
2 CONC genealogy, a hobby that required great analytical skills but no
2 CONC heavy lifting.

When those seven lines get imported, they will become two paragraphs separated by a blank line. Once they are in your database, the two paragraphs will word-wrap, if, for instance, you change the size or font of their window.

Importing

Once you get your GEDCOM as a text file from Word or Excel, assuming it concerns the MCCORKLE family, rename it to MCCORKLE.GED so that your genealogy program will know it is a GEDCOM. Create a new, empty database. Import the GEDCOM. The import will create a file named MCCORKLE.LST, which will have your errors. Go back to the GEDCOM, rename it MCCORKLE.TXT, open it with Word, fix the errors, re-re-name it back to MCCORKLE.GED and try again.

Once you get most of the people and most of the facts into your genealogy program, you'll have to link everyone to each other. All genealogy programs work a little differently, but when you add a spouse, child or parent, they ask if you want to link to someone already there, or enter a new person. You'll want to link.

I hope this helps. I've been able to create GEDCOM files from other files eight times out of the twelve times I've tried.


Problems, comments or complaints? Need an opinion? Send E-mail to:
This page updated: June 22, 2014