Intermediate Genealogy: Who to include Scams and Tricks GEDCOM Files: What it is How you read one Converting text to GEDCOM Standards Systematic Research Publishing your data on the Web Related Sections: Main Genealogy Page Beginning Genealogy Essays on Genealogy Biography Questions Cousins Why do it at all? Other Sections: Home Christmas Letters Essays Homilies Peace Corps Web Design |
The question of how to convert text to GEDCOM comes up now and again on the genealogy bulletin boards. Someone has an Excel spreadsheet, a text file or a Word document they want to convert to a GEDCOM. It may be an "Ancestors of" report from an e-correspondent or a carefully preserved typescript from a family member who has passed on. (I've always imagined heaven, for us genealogists, to be the place where you can finally finish your research: "Hi there, Ebeneezer! I'm your 4th great grandson! Now tell me - who were your parents?") Back here on earth, the people with the file want to be able to put the data into their genealogy program's database without typing it all over again. The easiest way, if the person who sent you the text is still alive and still speaking to you, is to ask them to send you the data again, only this time in GEDCOM format. If that is not an option, it is physically possible to transform a file into GEDCOM, but it is a lot of work and it requires a fair amount of skill with a word processing program or Excel. Even then, it is a lot of work, because you'll have to link everyone up afterwards. Basically, you have to transform John Wesley McCorkle was born 01 April 1852 in Pocatello, Idaho.into 0 @003@ INDI Your best results come if the original file uses a prose style that is obsessively consistent and duller than dirt. If, for example, John's sister "first saw the light of day on 01 September 1854" and his brother "gave joy to his parents on 01 November 1856", you have more substituting to do than if they were both "born on", as John was. Spreadsheets are a little easier in some ways, but they have their own problems. Before we start the step-by-step, you should know Word has a trick
or two up its sleeve. When you click on Step one is to create a sample. Start your genealogy program and add someone false. I use Arnold Aardvark. Give him one of each fact present in your text file; born, died, buried, died of, military service, graduated from, will probated on (date), and so on. Give him a two-paragraph note, too. That comes up later. Finally, export him as a one- person GEDCOM, open the GEDCOM with Notepad and print it off. That will show you what you are shooting for. Don't give Arnold a spouse, a child or a parent. It is easier to change the text file into GEDCOM format, import everyone on it as independent individuals, then use the "Add Spouse", "Add Child" and "Add Parents" functions to link everyone to each other than to try to add the family lines to your GEDCOM file. The next part assumes you have a text file or word document.
Excel spreadsheets come after this. I'm putting the replacement
strings in courier font for clarity, but the digit "1" and the
lower-case "L" look A LOT alike. Be careful. Assuming there is
a new paragraph for each person, your first step would be to
replace ^p (the paragraph break) with: That changes Now replace " was born" with That changes As you can see, we're getting somewhere. We still have to replace the word " in ", in front of Pocatello, with "2 PLAC" to get the birthplace fact straight. We have to cope with the other facts. We need to add two level 2 records to the NAME record - GIVN and SURN. We need another slash in the 1 level NAME record, to make "Wesley/" into /Wesley/. We have to change all of the other facts about John into GEDCOM records, using the Aardvark GEDCOM we made up above as a guide. Any narrative that doesn't have fit in a specific GEDCOM record goes in a NOTE. I treat notes below. When we finish, we have to go back and change the "#" in the level 0 INDI record to sequential numbers. It is a long process. If you are handy with Word macros, you can automate the process somewhat. If you have a copy of EDT run under VMS, you are in hog heaven, because it is easy. EDT is the finest text editor known to man, in my opinion. ExcelExcel lets you add columns of data, which helps. Some versions of it have problems saving spreadsheets as text files when lines in the text file are over 255 or 511 characters, which hinders. I'm going to go through the name part of converting an Excel spreadsheet to a GEDCOM file and leave the other facts as an exercise to the reader. Let's say you have a spreadsheet:
You want it to look like this:
Excel has some limited text manipulating abilities. For example, if a cell has text in it, you can find out where the last space is. This should be the space before the surname, no matter how many given names your individual has. Once you find out where the space is, you can tell Excel to copy the text from that last space to the end of the cell into a second column, thus isolating the surname. If any of your individuals have a two-word surname ("St. John"), change it to one word with an underscore. ("St._John") which you can replace later. Use the Excel functions to insert new columns in your spreadsheet, filling them with selected text from the original cells or with literals. Cell B1 in the spreadsheet above has 001 in it. You'd want to make cell B2 (B1+1) and so on, all the way down your rows. That would give your individuals unique numbers. Those columns above with the *L* in them are targets for Word, which come up in the next step. After you get the columns arranged, save the Excel
spreadsheet as a text file. Then open the text file with
Word, replace the "*L*" markers with ^l (line breaks). When you do,
what was columns D through H above will be one line, You may have to replace a slash followed by a space with a slash alone, or do other clean-up chores. If you saved the spreadsheet as a space-delimited text file, replace every instance of two spaces in a row to one space. Repeat as needed. If you saved the spreadsheet as a tab-delimited text file, replace all of the tab characters (^t) with null. Save the file again. Headers and NotesEach GEDCOM has a header and a trailer. They are not particularly important, except you won't be able to import your GEDCOM without them. Go back to your Aardvark GEDCOM and copy the header. Paste it into your GEDCOM at the top. Here is a sample: The trailer is just one record; it
tells your importing program that is has come to the end: NOTE records should be 80 characters or less, but you can continue the
note with either CONT or CONC records. CONT continues the note on a new
line, CONC concatenates the note data with the previous line. Here is an
example: ImportingOnce you get your GEDCOM as a text file from Word or Excel, assuming it concerns the MCCORKLE family, rename it to MCCORKLE.GED so that your genealogy program will know it is a GEDCOM. Create a new, empty database. Import the GEDCOM. The import will create a file named MCCORKLE.LST, which will have your errors. Go back to the GEDCOM, rename it MCCORKLE.TXT, open it with Word, fix the errors, re-re-name it back to MCCORKLE.GED and try again. Once you get most of the people and most of the facts into your genealogy program, you'll have to link everyone to each other. All genealogy programs work a little differently, but when you add a spouse, child or parent, they ask if you want to link to someone already there, or enter a new person. You'll want to link. I hope this helps. I've been able to create GEDCOM files from other files eight times out of the twelve times I've tried. |