Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, December 13, 2017

How important is high resolution for scanning and photography?

Are you tempted to join the megapixel race? Are you concerned about the resolution of your digitization efforts for photos, paper records, and other genealogically important documents? Do you use the megapixel count of a camera or smartphone as a factor in your purchase decisions? These issues and more concern anyone trying to digitize records or take photographs. Genealogists and photographers share some of the same concerns.

I have written on this topic several times in the past. Here is a list of some past posts that deal with aspects of this topic:
This list could go on and on. In a recent post, I expressed my views on the challenges of genealogy and I included an issue about the unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections thereby increasing inability of the larger collections to ingest smaller collections of records. On reflection, that topic needs more explanation and discussion. 

In response to my post on the challenges to genealogy, I got the following comment:
I have always been a believer that preservation should be performed at the highest possible resolution. As time has passed, as you mention, this could be 50 Megapixels today, and who know how much tomorrow? But the biggest advantage of 50 vs 12 Megapixels is the ability to zoom in and examine details closely. I have found this very helpful with things like scans of old vital records where correct interpretation of handwriting, for example, requires great magnification. It is useless if zooming in only results in a highly pixelated image. This applies likewise to photographs where the only image of GG Grandpa is a tiny section of a larger image. If I want to recognize his features clearly, I am grateful for a 50 Meg scan. Obviously, as you mention, file size (storage capacity) is an issue, but less so as time passes. Therefore, I support the ". . . unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections . . .". Tomorrow's researchers will thank us for adhering to those high standards.
Is there a direct relationship with a high megapixel count, say 50 megapixels or more, and the ability to recognize small features in either a photograph or another type of document?

We need to start any discussion of this type with some observations about physical reality.

I will start with photographs. Analog photographs using photographic film are considered to be continuous tone images. However, the resolution of a photograph depends on the type of film used. The sensitivity of film to light is measured in a number assigned by the International Organization for Standardization or ISO or the American Standards Association, now known as the American National Standards Insitute, or ANSI whose standard is usually designated by the older acronym, ASA number. There is a direct relationship between a film's ISO/ASA number and its ability to resolve fine detail, i.e. resolution. The higher the ISO/ASA number, the larger the grains of light-sensitive material, usually some compound of silver, used to capture the image. These numbers are usually used to represent the "speed" of the film or the time it takes to form an image. The higher the numbers, say around 1000 or 2000, mean that the film is very "fast." The tradeoff is always a loss in detail i.e. graininess of the image.

There is no free lunch, greater resolution means smaller discrete light sensitive elements. Photographers know that high ISO/ASA numbers (or fast film) mean a decline in detail in direct proportion to the additional speed. For those wishing to digitally reproduce film photographs, the resolution of the copy cannot exceed the original. Any document or photograph has a certain limit of resolution. Once a duplication method reaches that point of resolution there is no more information in the original that will be lost because of the copy. It may seem counterintuitive, but higher resolution scanning or photography past a certain threshold will simply result in larger file sizes and not any more detail. Once that limit has been reached, there is no more information to obtain.

I am not here talking about photographs of real-life objects, I am talking about copying historical records and photographs, essentially digital reproductions of actual analog documents.

Here is an example of what I mean. This is a microfilmed copy of a record from the website that was previously microfilmed and has now been made available in a digitized copy:

Now, how did this image come to be on the website? In a simplified explanation, someone had access to the original record and then made a photographic copy of the original using some type of microfilm. Here, the resolution was determined by the type of film, probably with a very low ISO/ASA number below 100, i.e. with the highest amount of detail available. Now, to move this image into the digital world, FamilySearch made a digital image at some extremely high resolution (for a digital image) and then processed that image for display on its website. What about the resolution of this image? Well, first of all, it is a JPEG image and we will have to view the image on our computer's monitor. Let's see what happens to this image at magnification. Here is a screenshot of the image at 300%.

Hmm. there appear to be some problems with the original. There is a great deal of bleed through from the back of the page. What about higher resolution? Here it is again at 600%.

Is there an upper limit? Yes, here is the image is again at 800%:

At this point, further magnification will simply start more pixelation and not provide any more detail. Could this be extended indefinitely be making the original with a higher digital pixel count? In reality, the file size would increase dramatically but you would still be limited by the resolution of the original image. Here is the same image at 1200% magnification.

Any higher and the image will start to become unrecognizable. Where can you see the most detail? Guess what? That depends on how closely you look at the image. If you stand some distance back, the high magnification images look just like the ones with lower magnification.

There is a reason why the Libray of Congress established standards as set forth in its "Guidelines: Technical Guidelines for Digitizing Cultural Heritage Materials." There is a balance between increased resolution and the preservation of the detail in a document or photograph. Higher resolutions give you larger file sizes but at some point, no more information from the original.

There is no free lunch. You cannot beat the system and the system is physics.

Tuesday, December 12, 2017

The Ultimate Challenges of Genealogical Access to Digitized Records

Online genealogically important historical records are rapidly transforming the way genealogists find their ancestors and extended ancestral families. Billions of new records are being added every year by the large online genealogy companies. It would seem that this flood of new records could go on indefinitely. But there are strong indications that the flood may soon diminish to a trickle unless the genealogical community can overcome some looming obstacles.

These obstacles to the continued increase in the number of online genealogical records fall into a number of categories that include the following:
  • Political restrictions on the access to records
  • The monetization of records by governments and other organizations
  • The reverse side of the principle of economies of scale, i.e. the cost of digitizing smaller collections of records
  • Unrealistically restrictive copyright and other similar restrictions on historical records
  • The unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections thereby increasing inability of the larger collections to ingest smaller collections of records
  • The costs of maintaining ever larger databases including the costs associated with migrating file formats over time
  • The lack of community standards for record formats and the inability of users to move records from one online family tree program to another
  • Ignorance of the members of the genealogical community as to the identity and availability of online digital record collections
Here is my viewpoint on each of these obstacles:

Political restrictions on the access to records

The most difficult and pervasive obstacles to continued digitization are the politically imposed restrictions on record access around the world. In some areas, record access, much less digitization of those records, is virtually impossible. It is clear that the ability of individuals to access records is a major threat to oligarchies and repressive governments no matter what their origin or motivation. This is not an issue that is limited to national governments but can operate on a local level when politicians believe their control and power are threatened by access. In the United States, for example, we would not have national and local freedom of information statutes were politicians and bureaucrats cooperative in providing access to "public" records. In addition, the ongoing destruction of genealogically important records and the attacks on state archives and libraries continues to threaten the availability of records around the country. Absent major changes in some countries of the world and even in parts of less repressive countries, many records will remain unavailable. Ultimately, the reasonably accessible records around the world will all be "cherry picked" leaving huge numbers of records locked up by repressive governments. 

The monetization of records by governments and other organizations

It is a fact of life for genealogists that access to more and more records around the world are being used by those who maintain or archive those records as local revenue streams. This occurs wholesale, even in the United States, for many types of records. For example, in almost every state of the United States of America, if you are born, get married or die and you or your family want a copy of an official government certificate of any of those events, you will have to pay a fee to obtain a copy. In England, it a common practice for local ecclesiastical parishes to charge a fee for access to historical parish registers. I am not of the opinion that all records must be free, but the monetization of the records makes their acquisition by free websites such as very unlikely. It also makes the overall cost of digitizing and making the records available much more expensive.

The reverse side of the principle of economies of scale, i.e. the cost of digitizing smaller collections of records

Record acquisition and digitization are labor intensive and the equipment needed for high-quality images is still quite expensive. For these reasons, extensive record digitization efforts can achieve economies of scale. On the other hand, smaller projects with fewer records require that those same assets but must be used with far fewer records so the cost per record becomes a major concern. In other words, smaller collections have some of the same overhead considerations as larger collections making the cost per record much higher. Also, the logistics of obtaining smaller records are usually about the same as larger collections. The results are that there are distinct disincentives to acquiring smaller collections of valuable records.

Unrealistically restrictive copyright and other similar restrictions on historical records

Unfortunately, US Copyright law is vague and overly restrictive. Current copyright claims will likely be in effect longer and any person now living. Even old copyright claims dating back to the 1920s and 30s will likely be arguably enforceable longer than anyone now living. This could be called the "Mickey Mouse" effect. In both 1976 and 1998, the existing copyright interests were extended for up to 120 years from the year of creation. See the post, "How Mickey Mount Keeps Changing Copyright Law." Because the provisions of these laws are vague, all sorts of claims to copyright now cloud the ability of genealogists to access records online.

In other cases, record repositories claim a "contractual" ownership right to documents that are clearly in the public domain. These claims prevent the free use of all sorts of records, photographs, and other documents. Until there is a realistic overhaul of the copyright laws and a clarification of the unfounded claims by repositories, many valuable records will be subject to restricted access.

The unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections thereby increasing inability of the larger collections to ingest smaller collections of records

This particular issue is less obvious than any of the other challenges facing genealogical access to digitized records. Essentially, those who are charged with developing the standards for online digital preservation impose unrealistic restrictions on the process of digitization. For example, we have long known that the highest resolution is approximately the equivalent of 170 dpi or PPI (pixels per inch) when viewed at 20 inches. In contrast, the average laser printer can print at 300 dpi or roughly double the eye's resolution. See "What is the highest resolution humans can distinguish." Presently, some of the digitization efforts going on around the world are using cameras that have up to 50 Megapixel sensors. Most of the documents being digitized could be adequately preserved with a camera of about 12 Megapixels the resolution of a present smartphone. The U.S. Library of Congress has established a publication called "Guidelines: Technical Guidelines for Digitizing Cultural Heritage Materials." Quoting from that publication concerning documents:
Image capture resolutions above 400 ppi may be appropriate for some materials, but imaging at higher resolutions is not required to achieve 4* compliance.
The practical effect of an artificially imposed higher standard is that many smaller collections are going to be lost because the large online genealogy companies refuse to ingest even images at the Library of Congress standard or make the process of obtaining images so complicated as to make smaller collections unfeasible.

The costs of maintaining ever larger databases including the costs of migrating the file formats over time

Even with the dramatic decreases in the cost of memory storage, huge online genealogical collections, especially those with photos, videos and audio files, can eat up huge amounts of memory into the hundreds of Terabytes. Adding in the cost of acquisition and maintenance makes this an extraordinary effort. Adding new records can have an incrementally higher cost. It is only a matter of time until these huge collections run into an economic and practical limit. However, there is a long way to go before this will happen. Right now, there is a major concern with the need to migrate existing collections as new file formats and operating systems evolve. Apple recently introduced a new file format for its smartphones, HEIC, and this will eventually affect the large online genealogy companies.

The lack of community standards for record formats and the inability of users to move records from one online family tree program to another

This is a major issue and I have written about this recently. Without community standards, each of the large online database companies is essentially an island of their own file formats. Without a standard way to exchange data, if one or more of these companies fail, much of their data could be lost.

Ignorance of the members of the genealogical community as to the identity and availability of online digital record collections

Let's face it. There is a constant loss of genealogical data due to genealogists who ignorantly or even intentionally fail to share their data and adequately prepare for its preservation upon their deaths. This attrition of records will always be a drag on preservation efforts.

There is always hope in the future and it is always possible that some or all of these issues will be resolved, but right now they stand as genealogy's greatest challenges. 

Sunday, December 10, 2017

Can your public library help you with your genealogy?
It may not occur to you but your local public library may be an excellent source of information for genealogical research. For example, the Hedberg Public Library in Janesville, Wisconsin has a long list of databases available both for use in the library and online with a library card. Some local public libraries, such as the Allen County Public Library headquartered in Fort Wayne, Indiana has one of the most extensive genealogical collections in the United States.

Here is a screenshot of the Allen County Public Library Genealogy Center website.

Your local library may be sponsored by your town, city, or county or all three. In Mesa, Arizona where I lived for many years, we had an excellent local Mesa Public Library. We also had an excellent county library system, the Maricopa County Public Library System, and a State Library in Phoenix. We also had an extensive system of Family History Centers around the Salt River Valley including the one where I was a volunteer, the Mesa FamilySearch Library.

It was interesting to me that many of the people I met in the Phoenix area who professed to be interested in genealogical research had never visited the Mesa FamilySearch Library and some had not even heard of its existence. There are over 5000 Family History Centers around the world and it is likely that there is one near you. See the Get Help menu for a location near you.

Sometimes we tend to judge a library by whether or not it has a particular book or other items we are searching for. But libraries can be surprising in the resources they have in their collections. If you are going to travel to an area where your family lived to do research, take the time to contact a local library in the area and ask about their resources.

Saturday, December 9, 2017

Artificial Intelligence, Chess, Voice Recognition and Genealogy
It may be one of the more obscure "news" events of the year, I found a reference to this "news" event in a blog post by my friend, Louis Kessler, up in Canada. The post the piqued my interest was entitled, "Chess and Artificial Intelligence: The Future Changed Today." This post talks about the Alphabet (Google) owned company, DeepMind.

You can get the details and watch the videos on Louis's blog post. If you have any appreciation at all for the advancements in technology, you will realize that this particular development is probably the most important change in our collective future to come along for quite a while. To understand the perspective here you need to focus on these paragraphs from Lewis's blog post:
Long, long ago, when I was a student at the University of Manitoba, I had a hobby I had dabbled in: programming a computer to play chess. I had reached a point where my program, Brute Force, was then one of the best in the world. I went to Seattle, Washington in 1977 for the 8th North American Computer Chess Championship, and followed that up in 1978 in Washington, D.C. for the 9th NACCC. (If you’re interested, see my writeup on my chess program, Brute Force).

The program was called Brute Force because I concentrated on doing the minimum possible to evaluate positions, and simply let the program iterate as many moves as possible to determine the best move. I had the full use of the University of Manitoba’s IBM 370/168 mainframe, which likely was as powerful then as your smartphone is today. Smartphones today can play better chess than the big computers did back then in the Computer Chess Championships of the ‘70s.
Here is a description of the DeepMind company from their website:
DeepMind was founded in London in 2010 and back by some of the most successful technology entrepreneurs in the world. Having been acquired by Google in 2014, we are now part of the Alphabet group. We continue to be based in our hometown of London, with additional research centres in Edmonton and Montreal, Canada, and a DeepMind Applied team in Mountain View, California.
 As usual, I must ask the question how does this affect genealogy? I can think of a number of areas, to begin with. For example the following:

  • Handwriting recognition
  • Intelligent indexing
  • Document recognition and cataloging
  • Correction of existing family tree entries i.e. standardization
  • Increasingly accurate record hints
  • Increasingly accurate duplicate record resolutions

From my own standpoint, the main area of the I would be concerned with the voice recognition software. As I have written recently, the current domination of the commercial market for individual voice recognition software is dominated by one company. The current product referred to as "Dragon" from is based on research done by IBM. Unfortunately, because of the lack of competition,  the products have been upgraded slowly and still contain numerous bugs. The biggest problem with all of the voice recognition software over the years has been its inability to improve performance by learning from its mistakes. The programs require individualized human intervention in order to learn new terms. User corrections to the text are not incorporated into the program. In the case of Dragon, new words must be individually entered. With the new developments in artificial intelligence outlined by Lewis's blog, someone or some company may be able to create a voice recognition program that actually works.

Friday, December 8, 2017

How Accurate is DNA Testing? Really?
DNA testing is not without controversy and unforeseeable consequences. The article shown above highlights some of the serious issues facing the forensic use of DNA evidence. With the popularity of genealogical DNA testing, it is important to understand the differences and similarities between forensic DNA testing and that done by the large genealogically motivated testing companies.

To begin to understand those differences and the inherent limitations of DNA testing, here is a quote from the above JSTOR Daily article:
DNA (deoxyribonucleic acid) is a code that programs how we will develop, grow, and function. Humans are thought to have DNA that is 99.9% identical, but the remaining 0.1% makes us individuals, marking us out as unique. The fact that humans and chimpanzees have just a 1% difference in their DNA further highlights how meaningful a small difference can be. Generally, the more closely related we are to someone, the more similar our DNA will be to theirs.
Continuing with a discussion of the limits of DNA testing in a criminal investigation, the article states:
Realistically, then, DNA profiles should only be thought of as being likely to have come from a specific individual. Statistical approaches such as “match probability,” which is based on comparisons between crime scene DNA and a hypothetical “random” person, often are misunderstood. A more rigorous statistical approach is likelihood ratio, which directly compares two hypotheses: the likelihood of the DNA coming from the suspect vs. the likelihood of the DNA coming from someone else. If the likelihood ratio is less than one, the defense position (the DNA is not the suspect’s) is better supported; if it is greater than one, there is more support for the prosecution case. Still, the ratio at most provides scientific support for a theory, not a yes-or-no answer.
One issue that has been ignorantly raised in several online articles is that the genealogical DNA testing being done, could ultimately be used in a court of law for a criminal prosecution. I wrote about this issue previously in two posts entitled, "Is genealogically submitted DNA discoverable in a criminal investigation?" and "A Little More About DNA and Criminal Investigations."

In reality, the standards for criminal justice in the United States would be extremely unlikely to admit genealogical data as evidence in a criminal trial. Here is one of the most limiting of the standards for DNA Evidence from the American Bar Association:
Standard 2.4 Collecting DNA samples from Persons in a group by consent 
A law enforcement officer should be permitted to obtain a DNA sample from a person by consent, except that: 
(a) consent should not be sought from persons based primarily upon their membership in a constitutionally protected class;
(b) consent should not be sought from a large number of persons based on grounds other than individualized suspicion that each committed the crime under investigation unless seeking such consent has been authorized by the head of a law enforcement agency or the chief prosecutor in that jurisdiction; and
(c) when consent is sought as provided in subdivision (b) of this standard, each person should be informed of the reason for the request and of the right to refuse it, and the consent should be obtained in writing.
Note that when DNA testing is directed at a group, those tested have a right to refuse testing based on the proposed forensic use. So, if you have your DNA test done by one of the genealogy companies, the results are not useable in court unless your consent was obtained in writing prior to the test. Standard 3.1 goes on to state the standards applied to forensic DNA testing laboratories.
Standard 3.1 Testing laboratories 
(a) A laboratory testing DNA evidence should:
(i) be accredited every two years under rigorous accreditation standards by a nonprofit professional association actively involved in forensic science and nationally recognized;
(ii) be governed by written policies and procedures, including protocols for testing and interpreting test results, and permit deviation from protocols only by a technical leader or other appropriate supervisor;
(iii) use quality assurance and quality control procedures, including audits, proficiency testing, and corrective action protocols, that are consistent with generally accepted practices and in writing;
(iv) use protocols for testing and interpreting DNA evidence that are scientifically validated through studies that are described in writing;
(v) follow procedures designed to minimize bias when interpreting test results;
(vi) timely report credible evidence of laboratory misconduct or serious negligence to the accrediting body; and
(vii) make available to the public the written material required by this standard.
(b) A laboratory testing DNA evidence should make available to the prosecution the information and material that the prosecutor must disclose to the defense pursuant to Standard 4.1, and to defense counsel the information and material that the defense must disclose to the prosecutor pursuant to that standard.
(c) When an accrediting body receives notice of credible evidence of laboratory misconduct or serious negligence concerning DNA evidence at the testing laboratory, either as provided in subdivision (a) (vi) of this standard or through other means, it should audit laboratory procedures and cases that may have been affected by the misconduct or serious negligence and issue a written report.
I could probably give many more examples of the limitations imposed on forensic DNA testing but here are a few links to get you started if you are interested:
For genealogists, the issue is the accuracy of the DNA data supplied by any testing company. Coupled with the use of online family trees, DNA testing can be quite accurate for near realatives of no less than three or four levels of separation, i.e. 2nd or perhaps 3rd cousins. Every level of "removal" or separation decreases the accuracy and thereby the reliability of the results. However, when reliable, traditional genealogical research is coupled with reliable DNA testing, the results may be extended further. 

Presently, testing done by different genealogically oriented testing companies will differ because of the fact that their testing procedures and databases are, in some cases, significantly different. These differences are presently unresolvable. 

From a legal standpoint, as a retired trial attorney, I would not feel it possible to use genealogically obtained DNA testing for any type of court proceeding.  

Thursday, December 7, 2017

BYU Family History Technology Lab Interview on BYURadio

Julie Rose's regularly broadcast interview program aired a segment interviewing guests Bill Barrett, Ph.D., Professor, Computer Science, Brigham Young University; Curtis Wigington, Masters’ Student, Brigham Young University of the Brigham Young University Family History Technology Lab. Julie Rose is a winner of multiple Edward R. Murrow Awards, and a seasoned broadcast journalist and interviewer. Prior to joining BYU Radio, Rose worked as a reporter and produced spots and feature news stories for NPR's Morning Edition and All Things Considered.

The segment is called, "Making Family History Fun (and Addicting?!)" highlights several of the programs developed by the Family History Technology Lab including the popular RelativeFinder App. Professor Barrett also mentions the great strides made by the Lab in developing handwriting recognition software. Click here to listen to the segment of the Interview.

Wednesday, December 6, 2017

Will Identical twins have the same DNA test results?

The answer to the question in the title of this post is more complex than it might seem. Here is a glimpse into the problems associated with DNA testing of identical twins from Identigene at
Until recently, the consensus has been that identical twins share completely identical DNA, but recent studies show that isn’t necessarily true. Rather than looking at the standard 15 markers analyzed in today’s paternity tests, highly-advanced and impossibly-expensive DNA tests that analyze the entire genome sequence-as many as six billion markers-are able to identify at least a single mutation in one of the identical twins’ genetics that has been passed on to the child (Sapiro). However, DNA tests that are presently accessible to the public do not analyze enough markers to distinguish the two, presenting a serious problem in court cases to establish paternity for child support. 
Hopefully, next-generation technology will be able to identify the differences between identical DNA in a way that’s affordable as well as accessible to the general public. Until then, paternity involving identical twins remains unsolvable.
One major consideration about genealogical DNA testing is that the databases are rapidly increasing in size and technology is also advancing. The difference between forensic and genealogical DNA testing is largely and matter of degree of specificity. Here is a comment on ancestry testing (genealogical DNA testing) from the Government of New South Wales, Australia.
Ancestry testing is commonly offered as an online test through private companies. As different companies compare test results to different databases, ethnicity may not be consistent.

In addition, findings about ethnicity may be different from an individual’s expectations as humans have mixed with different populations throughout history and consequently individuals may have many different variations in their DNA. It is helpful to know what type of testing is being undertaken for ancestry to ensure testing will provide some clues to the questions being asked.
Using identical triplets or twins for a comparison of genealogical DNA testing will only reflect the differences in the databases used by the different companies. These sorts of publicized tests say little about the utility or accuracy of genealogical testing.

Tuesday, December 5, 2017

Identical Triplets Take MyHeritage DNA Tests on the Today Show

Identical Triplets Take MyHeritage DNA Tests on the Today Show

Correspondent Jeff Rossen revealed DNA results from the three leading brands, for sisters who are identical triplets. MyHeritage was the only test that didn’t require filling a vial with spit, and is more affordable than the competition, without sacrificing accuracy.

DPLA adds Oklahoma Hub

The Digital Public Library of America continues to expand its huge collection of free, online digital content. Quoting from a recent announcement:
Over 100,000 records from our newest partner, OK Hub, are now discoverable in DPLA. The Oklahoma Hub represents a collaboration between lead partners The University of Oklahoma and Oklahoma State University, with extensive resources from Oklahoma Historical Society and Oklahoma Department of Libraries. Together, these collections offer unique new resources, particularly in the areas of Native American history and culture, environmental and agricultural science, and the lives and experiences of generations of Oklahomans.
The notice continued with an explanation of the contents of the newly attached records.
Students and scholars of the settlement of the Oklahoma frontier will find thousands of photographs from the collections of Oklahoma Historical Society, including photos from the Oklahoma Publishing Company, which started operating as such in 1903 in Oklahoma City and continues today. Images in these collections capture everyday life for the diverse residents of turn-of-the-century Oklahoma, including members of diverse Native American groups including Cherokee, Choctaw, Creek, Seminole, Chickasaw, Seminole, and Kiowa. Together with The University of Oklahoma’s Indian Pioneer Oral Histories conducted in the 1930s and the Duke Collection of American Indian Oral Histories (1967-1972), these rich new materials on testify to the tension between cultural perseverance and assimilation as well as a decades-long process of negotiating the allotment of one of America’s most treasured resources: land.
This new addition brings the total number of records in the DP.LA collection to 18,430,270.

Monday, December 4, 2017

What do we remember? A Cautionary Tale for Genealogists

Salvador Dali's painting, "The Persistence of Memory" is a classic surrealist work that is commonly associated with the idea of the relativity of time and space. Many years ago, I began my university years as a Fine Arts major studying painting and drawing at which time,  I became aware have a great deal of art history. For me, this particular painting has come to represent the malleability of memory. It shows how over time, orally transmitted stories and events change to adapt to the present impressions of the historical researcher. The hard edges of the stories from the past often become softened by the passage of time.

For genealogists, this painting should represent the need for transmitting current information about our lives and families through journals and diaries. However, as has been the reality of these documents throughout history is that they almost always exist in a single copy. That single copy is usually transmitted from generation to generation and if preserved, usually falls into the hands of the family memory hoarder. When this ostensible family historian finally dies, this becomes the decisive moment for deciding whether or not the particular document is preserved.

Today's technology provides us with a method of preservation that will increase the possibility that a personal journal or diary will survive the transmission process. There are a number of repositories for such personal records and many of those records are now being digitized and put online. One of those major repositories is the Books collection. As of today, this collection has 351,082 digitized books and records. While writing this post, I made a search for the word "diary" and found 33,733 results. Unfortunately, a search for the word "Journal" would produce a skewed result because of the word's association with other forms of publication. For genealogists, the importance of this collection on is that all of the books and other publications that have been digitized relate to genealogical research.

In a larger sense, the above painting should also remind us that our own memories in research are evanescent unless properly documented and preserved. Another important component of preservation of personal memories is the oral interview. During the past few years, I've conducted a number of oral interviews. However, conducting the interview is just a first step, you need to make provisions for the preservation of the information. In every case, a transcription of the interview should be made and copied and preserved. Local university libraries have a special collections departments that may be willing to accept and preserve personal diaries and journals. You might also wish to check with historical societies and other such organizations. All of my oral interviews over the past few years are now being stored and hopefully transcribed by the Harold B. Lee Library, Special Collections Library.

If you want to give a gift this Christmas season, how about giving one of memories and providing a way for those memories to be preserved?

Sunday, December 3, 2017

Can a computer do genealogy?

Computers are complex calculating machines. They process unimaginable amounts of data using 1s and 0s. You might have heard the term "artificial intelligence" bantered around from time to time. Many genealogists, not all by any means, now benefit from using tremendously powerful computers and very sophisticated computer programs to assist with their genealogical research. The ultimate question is simply this: will everything we now do as genealogists be replaced by more highly sophisticated computer programs?

In thinking about this question, I began mentally constructing a hierarchy of the complexity associated with what I, and others, do as a genealogist. Some of those tasks involve simple search and retrieval functions, such as looking for a name in an index of names. At the level of a computer program, this search involves the process of looking at matching strings of characters. A simple example is the "search and replace" function of a word processing program. At this level, there is an assumption that everything that is being searched is in "text" format. In a genealogical context, when we are "Indexing" records, we are converting the printed or handwritten information on the record into a "text" file the computer can use to search for equivalent patterns. Computer programs are very good at identifying certain types of patterns.

As we move up the scale of genealogical complexity, we quickly leave the search and replace or search and match level and move onto ideas and concepts that are much more difficult to program into a computer. One step up from search and replace we arrive at the challenges of optical character recognition. Genealogists are benefitting from this particular level of computer programming by having the ability to search through huge amounts of text from books and printed records. The challenges of optical character recognition mainly focus on background noise and poorly formed characters. For example, there may be little contrast between the text characters being recognized and the substrate they are printed on. In another case, the characters may be misinformed, i.e. broken, and hard to recognize. If you have used an optical character recognition program to realize that the output is not perfect. But even if you were copying the text manually, your copy would also probably be imperfect.

By using both optical character recognition and manual indexing, online genealogical database programs are able to provide "record hints" at an amazing level of accuracy. Couldn't that level of accuracy simply be extended to the point of having the program construct your family tree? Right now, the answer is no. An obvious fact of genealogical life is that many records we deal with are handwritten. Handwriting recognition is a major increase in complexity over text recognition. I have written about the status of handwriting recognition programs recently. There is a distinct possibility that handwriting recognition programs will achieve the same level of accuracy as human-based handwriting recognition.

In all this, there is the metaphorical "elephant in the room" that severely limits of one program to "do genealogy." That is the simple fact that genealogical research is spread both physically and content-wise all over the world. No one database or computer system or program has access to even a very small percentage of the total number of records available. In making this observation, I am not referring to the fact that many records remain in paper format but to the fact that records are scattered physically across the world. Even with the huge advances made by the Internet, information is still largely compartmentalized.

But what if we took a completely different approach to doing genealogy? Couldn't we do DNA testing of every person on the face of the earth and thereby construct an existing relationship tree of everyone? The answer is probably yes. But the answer begs the question. Even if we knew how everyone living was related that would give us no information about the identity of our ancestors. Showing degrees of relatedness only provides an incentive for doing historical research.

What if every historical record in existence was identified, digitized, indexed and made completely and freely available online? Could a computer program then construct a family tree for everyone on the earth? Theoretically yes but practically no. However, in limited contexts finding information for a particular individual is already available. All four of the large online genealogical database programs that supply record hints already do this to some extent. But unfortunately, there are many many people in the world who fall outside of the system. To a large extent, these relationships are established by the fact that millions of people have entered their historical genealogical information into online family trees. Regardless of the accuracy of the information, computers are able to evaluate and match relationships especially when supported by DNA testing.

But what may not be obvious is that regardless of the sophistication of the genealogical database programs, they rely entirely upon individual research including the evaluation of relationships and records by individuals and not by their computer programs.

In writing this commentary, I am not trying to denigrate the advances made by computerization of portions of the genealogical research process. For example, have written this entire blog post using a voice recognition program. You may be able to see the limitations of such a program in typos in my text.

Theoretically, given enough computer power, assuming that all available historical records were digitized and made available for text recognition and/or handwriting recognition, and further assuming that privacy and political concerns could be satisfied, I can see where substantial amounts of the routine genealogical research could be automated and extended beyond its present capabilities.

Back in 1950, Alan Turing proposed a test of a machines ability to exhibit intelligent behavior equivalent to, or indistinguishable, from that of a human. Essentially, the Turing test was whether or not a computer could carry on a conversation with a human without the human determining that the computer program was involved. The test is used essentially to determine the level of artificial intelligence obtained by a computer program. From my standpoint, the Turing test could be satisfied long before the program could be developed to do accurate genealogical research.

Because of my background in linguistics, I am also unconvinced that we presently have the ability to replicate human speech with the computer. Notwithstanding programs like Siri, human speech is one of the ultimately complex phenomena. Genealogical research approaches the complexity of human speech. Although I see some significant advances in computer search programs associated with genealogical research, I do not see the computers taking over my job anytime in the near future.

Saturday, December 2, 2017

New Videos on the BYU YouTube Channel December 2017

As I have been posting for some time now, my wife and I will be leaving on a full-time mission and serving as Record Preservation Missionaries in the Maryland State Archives. I have had quite a few comments about my leaving the Brigham Young University Family History Library for a year and the possible impact on the video uploaded to the BYU Family History Library YouTube Channel. I can assure you that high-quality videos will continue to be uploaded in my absence. In fact, it appears that I will hardly be missed. Here are a few of the new videos uploaded in the last few days.

Not Tech Savvy? Six Easy to Use Family History Phone Apps - Jean Naisbitt

Getting Started with Web Indexing - Kathryn Grant

FamilySearch Memories: Connecting Relatives Near and Far - Jean Naisbitt

Back Up Your Data Now or Cry - James Tanner

The videos can also be viewed from the BYU Family History Library Website

Meanwhile, you can follow our mission experiences on my Rejoice and be exceeding glad... blog with the tagline of "A Family History Mission."

Friday, December 1, 2017

Using Facebook for Genealogy?

Trying to do anything other than look at the news stream on Facebook reminds me of walking into the jungle in Central America. The amount of "information" is overwhelming and there are real dangers lurking in the shadows but every once and while, I manage to see something of interest. Can that jungle experience really be used to do significant genealogical research? Well, that depends on how you define doing research.

If you are searching for contemporary records about your ancestors, i.e. doing historical genealogical research, searching on Facebook seems pretty silly. For example, here is screenshot of a search for one of my ancestors.

The results here give me one other person who has mentioned my supposed ancestor back in 2009. As a side note, I presently have links to over 100,000 SmartMatches on of potential relatives. This post expands into a transcription of book called the "New England Families" book.

If I were just starting out, this might be interesting, but if you read it carefully, you will see that this is just a repetition of speculation on the identity of this particular individual. Interestingly, the source is listed at the bottom of the entry as follows:

source: NORTH KINGSTON, RI (BOOK) 1395631 (8327302) (MICROFILM) 1395773 (8412107).

If you happen to know what this means, you might find the book on microfilm on Unfortunately, none of those numbers match any entry in the Catalog when I search by Film Number. They might be old film numbers. What about the title of the book? Here is the citation to this four volume work. 

Cutter, William Richard. 1913. New England families, genealogical and memorial: a record of the achievements of her people in the making of commonwealths and the founding of a nation. New York: Lewis Historical Pub. Co.

The book is generally available online in digital editions including Google Books. Assuming that I know all this and can find the book, I would now know a fair amount about my supposed ancestor. However, in this particular case, most of the information happens to be pure, unsupported speculation and is really copied from another earlier source. 

Did Facebook help me with my genealogy? I think we have to look further for an answer. In this particular case, there is a link to the Rhode Island Genealogy Network. 

I could post an inquiry or ask a question if I joined this group. But by joining a Facebook group, I will get more email and posts. There is a tradeoff. Actually, I am already a member of this group. 

Is there really any way further other than seeking opportunities to talk to other researchers that I can get any benefit from Facebook?

I could opt for some general education and links to programs and companies. I could post my own findings and blogs on Facebook and hope for some help.

My general use of Facebook is for information about my friends and family and for communicating with others on the program. Facebook is a good advertising medium and used by most of the genealogy companies. For example, there is a #RootsTech 2018 Facebook page for the upcoming conference.

It is generally a way to maintain contact with your family or an interest group. However, it has little use as a research tool. 

Thursday, November 30, 2017

MyHeritage Adds User Corrections to Historical Records' Indexing
One of frustrating the parts of doing online genealogical research is relying on the sometimes faulty indexing of digitized records. I regularly find records that have been mis-indexed, but usually only after an extended search. All that is really needed is the ability to annotate or correct the online indexed records and then others searching for the same information would not have such a difficult time. In fulfilling this need, has introduced a sophisticated feature that allows users to correct mistranscribed or misspelled names in historical records on their SuperSearch™ program.

This new feature is announced and explained in detail in their recent blog post entitled, "New Feature: Do-it-Yourself Historical Record Fixes." This feature on is not only intended to "fix" transcription or indexing errors but also to correct information contained in the original document such as clerical errors on the part of the person who wrote the original record.

This concept moves the process of indexing into a closer association with those who are doing research in original documents. In the past and as is also the case presently, indexing projects are conducted either by paid contractors or volunteers. The marvelous work done by these indexers has revolutionized genealogical research. But very very few of these contractors and volunteers are experienced genealogical researchers themselves. Adding a feature, such as the one introduced by, allows those who are actually involved in the research process to increase the reliability of the existing indexes.

Obviously, the original record is not altered. The fix involves adding alternative readings of the record. What is most helpful is that automatically adds the alternatives to their search engine technology, SuperSearch™.

Rather than reproduce the entire blog post, I suggest you read the original article linked above.

Some Things You Need to Know About Identity Theft in 2017: Genealogy and Otherwise

A local newspaper just featured a front-page article on identity theft. This reminded me that I hadn't written on how this topic impacts genealogists for a while, at least for about a year. First of all, what is identity theft? The term itself is quite vague and used to cover a huge number of anti-social and some few criminal activities. In addition, the idea of identity theft is usually raised any time there is a news report of a major computer hacking operation. The threat of identity theft is commonly used as a bogeyman to scare people by commercial enterprises who profit from people's fear, real or otherwise.

Rather than ask "what is identity theft," I would ask if there is a commonly accepted definition at all? I would also ask if there are any statistics about the frequency of whatever is defined as identity theft and statistics on the actual damages suffered by "victims of identity theft." Knowing about these statistics when compared to the other dangers of our modern world, will go a long way towards establishing a basis for any needed action.

In looking back at my previous posts on the subject, I am also interested to see if there are any updated statistics or changes in the definition of identity theft. Then, I need to evaluate whether or not participating in genealogy online raises the threat of having your identity stolen or whatever.

First of all, you need to understand that all of the "statistics" about identity theft and any other related activity are based on complaints, not arrests and certainly not on criminal convictions. So, complaints, well-founded or spurious, are the real issue here. Here is a quote from the Federal Trade Commissions, Consumer Sentinel Network.
Consumer Sentinel is a secure online database of millions of consumer complaints available only to law enforcement. Complaints in Consumer Sentinel are about:
  • Identity Theft
  • Do-Not-Call Registry Violations
  • Computers, the Internet, and Online Auctions
  • Telemarketing Scams 
  • Advance-fee Loans and Credit Scams
  • Sweepstakes, Lotteries, and Prizes 
  • Business Opportunities and Work-at-Home Schemes 
  • Health and Weight Loss Products
  • NOW AVAILABLE: All consumer complaints filed with the FTC about financial issues, such as credit reports, debt collection, financial institutions, and lending.
Unfortunately, all of these statistics are only available to approved organizations. So where do we go to find out about identity theft?

We might look at the U.S. Office of Justice Programs, Bureau of Justice Statistics, but the most recent statistics are from 2014. Here is the summary of what the entire report shows:
  • About 7% of persons age 16 or older were victims of identity theft in 2014, similar to findings in 2012. „
  • The majority of identity theft victims (86%) experienced the fraudulent use of existing account information, such as credit card or bank account information.
  • The number of elderly victims of identity theft increased from 2.1 million in 2012 to 2.6 million in 2014.
  • About 14% of identity theft victims experienced out-of-pocket losses of $1 or more. Of these victims, about half suffered losses of less than $100.
  • „ Half of identity theft victims who were able to resolve any associated problems did so in a day or less.
OK, so reading this from 2014, you can see that 7% of the over 16 years of age population back in 2014 complained about identity theft (remember victims of identity theft is measured by complaints). of that 7%, 86% of the complaints involved credit cards or bank accounts. Only 14% of the 7% or .0098 or about less than 1% of the complaints involves losses of more than $1. About half of the 7% or about 3.5% suffered losses of over $100. 

Hmm. Are there any other statistics? Well, the above statistics actually do not come from counting any sort of complaint, they actually come from a survey: the National Crime Victimization Survey. This points up an important fact:

Always check the sources.

Does this sound like a good genealogical research policy? It is also a good idea when attempting to verify any statistics, government or otherwise.

Once again, are there any current statistics on the actual incidence of identity theft and is there a definition of identity theft?

Let's look at a representative news story: "Identity theft hit an all-time high in 2016" from USA Today. From the headline, you might suspect that there were some statistics out there to support such a claim. Read the article. The entire article (and by the way, the one on the front page of the local newspaper I mentioned above) is supported by data from the Javelin Strategy and Research company. Here is a summary of that company from their website:
Javelin Strategy & Research is a research-based advisory firm that helps its clients to make informed decisions in a digital financial world. Our analysts offer objective, strategic and, above all, actionable insights and unearth opportunities to sustainably increase profits.
Isn't this sort of like asking an insurance salesman if you need insurance or an attorney if you need legal advice? If you keep reading on their website, once again, the supposed statistics come from a survey.

Now, if you think about it, if identity theft is theft then wouldn't there be some statistics about crime?
Actually, it is quite a challenge to find actual crime statistics. The FTC is still using data from 2006 to 2008. See Frederal Trade Commission, Identity Theft and Data Security. So how about the FBI or the Justice Department?

The FBI has crime statistics for 2016. However, there is no obvious mention of identity theft. If identity theft is such an important and prominent crime, why isn't it mentioned in the FBI's report? Here is a screenshot of the actual report page:

Where does "identity theft" fit into the crime statistics? Oh, by the way, property crime figures are in dramatic decline in the United States according to the FBI.
What is identity theft if it isn't a crime? Actually, many states have tried to define it and make it a crime. Here is one definition from the Utah Criminal Code:

Effective 5/12/2015 
76-6-1102.  Identity fraud crime. 
(1)As used in this part, "personal identifying information" may include:
(b)birth date;
(d)telephone number;
(e)drivers license number;
(f)Social Security number;
(g)place of employment;
(h)employee identification numbers or other personal identification numbers;
(i)mother's maiden name;
(j)electronic identification numbers;
(k)electronic signatures under Title 46, Chapter 4, Uniform Electronic Transactions Act;
(l)any other numbers or information that can be used to access a person's financial resources or medical information, except for numbers or information that can be prosecuted as financial transaction card offenses under Sections 76-6-506 through 76-6-506.6; or
(m)a photograph or any other realistic likeness.
(a)A person is guilty of identity fraud when that person knowingly or intentionally uses, or attempts to use, the personal identifying information of another person, whether that person is alive or deceased, with fraudulent intent, including to obtain, or attempt to obtain, credit, goods, services, employment, any other thing of value, or medical information.
(b)It is not a defense to a violation of Subsection (2)(a) that the person did not know that the personal information belonged to another person.
(3)Identity fraud is:
(a)except as provided in Subsection (3)(b)(ii), a third degree felony if the value of the credit, goods, services, employment, or any other thing of value is less than $5,000; or
(b)a second degree felony if:
(i)the value of the credit, goods, services, employment, or any other thing of value is or exceeds $5,000; or
(ii)the use described in Subsection (2)(a) of personal identifying information results, directly or indirectly, in bodily injury to another person.
(4)Multiple violations may be aggregated into a single offense, and the degree of the offense is determined by the total value of all credit, goods, services, or any other thing of value used, or attempted to be used, through the multiple violations.
(5)When a defendant is convicted of a violation of this section, the court shall order the defendant to make restitution to any victim of the offense or state on the record the reason the court does not find ordering restitution to be appropriate.
(6)Restitution under Subsection (5) may include:
(a)payment for any costs incurred, including attorney fees, lost wages, and replacement of checks; and
(b)the value of the victim's time incurred due to the offense:
(i)in clearing the victim's credit history or credit rating;
(ii)in any civil or administrative proceedings necessary to satisfy or resolve any debt, lien, or other obligation of the victim or imputed to the victim and arising from the offense; and
(iii)in attempting to remedy any other intended or actual harm to the victim incurred as a result of the offense.
How many people were charged with and/or convicted of this crime in Utah since 2015 when this law was passed? Here is an infographic of the current number:

Once again, these numbers do not reflect arrests or convictions, they are reports to an online website. Going around the circle, remember that the Bureau of Justice Statistics on identity theft date back to 2014. The situation is that the term identity theft is used for everything from losing a credit card to elder abuse. 

Over the years since identity theft became an issue, there has been little or no consensus on the definition, there is no consistent measure of arrests and convictions, and there are statistics that show that all forms of property crimes are decreasing. 

Should genealogists be worried? Not any more than the general population. Should the general population be worried? No more than usual. We still need to be cautious about sharing personal information with anyone not obviously authorized and refrain from including information about living relatives online and continue to maintain reasonable personal security measures such as avoiding online fraud etc. 

What would be nice is if the banks would quit using relatives names and surnames as "security questions."