Gedbot

From Rodovid Engine

Jump to: navigation, search
Gedbot's first attempt to create a record in Rodovid Database.
Gedbot's first attempt to create a record in Rodovid Database.

Contents

[edit] Help needed

Here is the discussion page about a possible tool (a bot) which could upload a .gedcom file to Rodovid database, by writing each record.

Pierre Frappé is the initiator of this work. He needs help from web specialized developpers to complete this first bot running in Rodovid.

Does any Rodovid user have the needed competences ? Please write here or on his discussion page.

Read below the status of work :


Privit Baya,

and thanks again for EV.

  1. A new french User, fr:Discussion Utilisateur:Pfrappe proposed himself to build a bot simulating the input of records in RD, on the base of a Gedcom file. (Something like a Gedbot)
  2. A few weeks ago, Alain proposed to activate Gedcom import only for admins.

What do you think of fusing these two ideas: we could propose Gedcom import with the Gedbot built by Pfrappe, after verification done by admins...

Do you think it's possible? Am I clear? If you think it's possible, how can we test that, with a some little gedcom file? --Christophe Tesson - talk. 09:53, 20 July 2011 (EEST)

Salut,

  1. bot is a good idea.
  2. A problem of a double and an empty persons is not resolved.
  3. When selecting "Edit genealogical records as gedcom" in bot preferences, bot can upload data directly in gedcom format with some exceptions.
  4. Similar search that was used in old gedcom import
  5. during import I have added new additional REFN. See example fr:Personne:13891 (other). This REFN can be used for backward search and update local files.

Gedcom import does not work in that manner that I want. Gedcom format processing is working (you can use it by "Edit genealogical records as gedcom"). Double and "empty" persons in RD db are still a big problem. I don't want to make same record addition easy for anybody. If anybody want to use a semiautomatic gedcom import - gedbot will be good choice for them. But in this case this person will be responsible one, but not me )))))

For testing you can use "engine". Engine db is a separate db. So you can test here everything. Note, to simplify the admin's work, Gedbot can add all the new persons into a special category (for example: Imported by Gedbot). --Baya 11:05, 20 July 2011 (EEST)

Hello Baya and Christophe

I'm just discovering what has been already done in Rodovid about Gedcom import.

I agree that we must be very prudent with massive and too easy imports.

My idea is only to facilitate importation with these features :

  1. avoid manual copy from private databases (it creates many mistakes and is a waste of time)
  2. normalize as much as possible the datas
  3. give maximum informations about possibility of double person (search for existing similarities)
  4. create persons in a temporary status (category seems a good idea), waiting for more control.

In this case, the bot could run on client side this way :

  1. open gedcom file
  2. person by person (and family by family) :
    A. ask RD for similarities
    B. RD proposes a list of existing person to be manually compared
    B1. the user accepts one proposed refnum (and then the comparisons continue using this reference)
    B2. or refuse and create the person as new

If a user wants to use the bot, he has to install Python and run the bot on his side.

  • What about this ?
  • What is the problem with null persons ?
  • Is existing Gedcom import program writen in Python ?
  • Is it possible to get the existing program somewhere so I can get experience working with Engine ?

Thanks in advance

(Anyway, I'll be on holidays from July, 29th to August, 16Th)

Pfrappe 17:46, 20 July 2011 (EEST)

Hello Pfrappe,

You are absolutely correct BUT just in case of one alphabet. RD uses many languages. How would you avoid a doubling when a person was added in the other language (using the other alphabet)? This is the most complicated problem for today. This is also a problem of double persons.

[edit] Some notes to algorithm

I have renumbered the steps.

Step 2. To decrease the server load it will be good to import persons in the order from ancestors to descendants. In other words, if any of ancestors (or its family) of the person is not yet imported it should be imported before the further proceeding with one.

Step 2B. The old import procedure was fully on the server side. So, we need to describe the list of similar persons and I need to implement it. I think, currently JSON format will be the most convenient.

[edit] null persons

Any person can be almost completely described by 3 of 4 parameters listed below (in some cases only 2 of them would be enough ).

  1. full name
  2. date time of birth
  3. place of birth
  4. parents

Persons not fully described are "null" persons. They can be doubled very easy .... Currently, imho, rodovid contains 4-15% of null persons.

[edit] PHP

Old gedcom import was written on php. But currently only html is required for communication with RD. So, the client side can be written on any language.

[edit] Other clients

There are no other clients for working with RD. Some time ago we discussed Rodovid Mobile Access with A. Cotting. But I don't know whether Cotting is doing it now or not. We can contact him on this question.


Sincerely, --Baya 19:03, 20 July 2011 (EEST)

[edit] Algorithm

this section is still in progress !

[edit] Goal of the Gedbot

The goal of the bot is only to facilitate importation with these features :

  1. avoid manual copy from private databases (it creates many mistakes and is a waste of time)
  2. normalize as much as possible the datas
  3. give maximum informations about possibility of double person (search for existing similarities)
  4. create persons in a temporary status (category seems a good idea), waiting for more control.


[edit] Detail of algorithm (work in progress !)

  1. open gedcom file
  2. create internal tree beginning by ancestors (and probably generation by generation)
  3. for each generation:
    for each person:
    A. ask RD for similarities (see below)
    B. RD proposes a list of existing person to be manually compared
    B1. the user accepts one proposed refnum (and then the comparisons continue using this reference)
    B1.1 if dates in Gedcom is more precise or best located than in RD, the new information may be added
    B2. or refuse and create the person as new one (in that case, Gedbot receives the new Refnum)

2) most of the old ancestors -> descendants (each person can be imported only after all of its ancestors were fully imported). So you take any person form a gedcom file and just check the ancestors. In this case it will be also very easy to check the consistency of a gedcom file.

B1) refnum can be stored into gedcom file for the future using.

--Baya 12:07, 21 July 2011 (EEST)

[edit] Searching for similarities

  • to ask for similarities, send to RD the following informations :
  • surname, givennames,
  • date and place of birth and death,
  • parents refnum or info
  • infos about spouses
  • infos about childs

for example an XML tree (as <person><parents><father>..</father><mother>..</mother><spouses>..<childs>...</childs></spouses></person>)

if use tree way import

  1. (A->D way) only parents refnum in p.3 required, (D->A way) only children refnums required
  2. in RD persons linked only with parents. Family information is additional (parents in family have same events stored into family not in parent space). So any family information can be imported after all persons (again gedcom consistency checking). So spouses be imported as parent of next child (as new non bloody tree). In result user will know how many different bloody trees he has in his gedcom file.
  3. child can be imported only after parents.

in other case it is required additional similar search for every spouses and child of every person that checking for import.

so better way.

  1. send list of all persons in gedcom (surname, givennames, date and place of birth and death)
  2. recieve list of similar
  3. manual visual check of similars
    1. if similar found
      1. assign refnum and continue in D->A or A->D way.
    2. if similar not found
      1. tree way import starts
--Baya 12:28, 21 July 2011 (EEST)

[edit] comparison

  • the comparison could be made by relevance, after computing a value :
  • adding some point for each item matching
  • substract points when bad matching (contradictions)
we must describe it precisely --Baya 12:31, 21 July 2011 (EEST)

[edit] alphabet

is it possible to create (and store) a kind of « signature » for each person using

  • phonetical translation of names (I think it could exist something like that somewhere ?)
  • translation of all the dates in the same calendar
  • identification of the town by official location in Wikipedia (or something like that !)
Pierre there's something looking like the phonetical translation you're talking about: Soundex. See:

this page, on Geneawiki (fr), or this one (en). Don't know if it works with cyrillic. --Christophe Tesson - talk. 23:45, 20 July 2011 (EEST)

Soundex with Cyrillic: see this page
  • UTF-8 contains (can) all alphabets. But authors write about only Latin and Cyrillic. Where other alphabets? This is two different (or one) algorithms for two different alphabets. We require one algorithm for all alphabets. --Baya 14:24, 21 July 2011 (EEST)


Soundex require Latin. Also exists many other and better ))). Imho we can use it. But most of them work with Latin, it required transcription of names from other alphabet to Latin (in any case for comparison two words in different alphabets we must have both words in one alphabet). Unfortunately, for example, there exists many transcription table for Cyrillic to Latin.

[edit] dates

currently all dates in RD db are Gregorian. All dates must be converted to Gregorian before import to RD. But it is possible to add possibility for users to select any calendar.

[edit] places

Currently I understand that places must be rearranged as persons (with "parents", "children", etc). It is big part of work. (((

--Baya 12:53, 21 July 2011 (EEST)

[edit] Installation and use (future)

  1. install Python on your computer (2.5 or more but not 3.x)
  2. download gedbot.py (and other files if necessary)
  3. create your GedCom file and store it in the right directory (to be precised)
  4. run python.exe gedbot.py

Pfrappe 20:28, 20 July 2011 (EEST)

Personal tools
In other languages