European Molecular
Biology Computing Network - Biocomputing Tutorials cri-map" Data
Analysis 2

CRI-MAP Tutorial - Locus Insertion & Chromosomes


CRI-MAP tutorial contents:

Manuals Manual Table of Contents Data Sets Data Formatting Mapping & LOD scores Bibliography & Links
 
Manuals:
Web & text
versions
Web Manual
Table of
Contents
 Tutorial 
 Practice 
 Datasets 
Formating
data with
"prepare"
Mapping
with
"build"
Bibliography
&
Other Links


6. Positioning a locus on a map  (all option)
Ex. 6.1 - locating new loci with all
Ex. 6.2 - checking map order with flipsn; reporting map changes with fixed
7. Seeing the cross-overs (chrompic option)
Ex. 7.1 - view the data used to produce the current map
Ex. 7.2 - evaluate the dataset used and plan its improvement
8. An Evaluation of Our Map


6. Positioning a locus on a map

Time for a short review. So far we have assembled a "best" map of 27 loci spanning close to 230 cM. We did this in two stages, in two rounds of twopoint -> build -> flips2 combinations, and increased the map first from three loci to twelve, and next from twelve to 27. To achieve the first round map, we relied slightly on (and were mislead by, on occasion) cytological data, and avoided the mapping information stored in the chr2.ord file. The reason for this was to force CRI-MAP to construct its maps using the data afresh each time, without possibly incorrect constraints on the true locus order from previous build runs.
To get the second round map, we ignored the cytological data and used twopoint linkage data combined with the mapping information stored in the chr2.ord file. We also relaxed the stringency for adding loci to the first round map.
Now we have a large map, a high degree of confidence in most of it, and a substantial database of possible locus orders for those loci still-to-be-placed, residing in the chr220.ord file. Given that we have just improved the current map - by reversing one pair of loci to give a better LOD score - can the chr220.ord file be used to map or approximate the position of as-yet-unmapped loci? No, unfortunately not. The new locus order doesn't occur in chr220.ord, so any CRI-MAP options (e.g., build, instant, quick) that use this file to augment the current map will reject ALL new maps!

The option of choice for positioning one new or a couple of difficult old loci is all; for the loci in the inserted_loci parametre it tests all possible positions and combinations of positions along the current map, and reports the LOD scores. If one of the position combinations has a LOD score at least 2.000 better than any other, we can manually insert the loci at their positions and augment the current map.
Which loci from the chromosome 2 dataset are good candidates for use with all? For "new" loci, there are the two p arm loci that were excluded from the second round of build runs because they showed no twopoint linkage to the 12 locus map: D2S61_2 (64) & CPSI (9). There are also two q arm loci that show linkage to some of the recently added loci of the current map: CRYG1-5 (0) & D2S35 (14) (data from a twopoint run not shown.) For problematic old loci, there are those that DID enter the second round maps of Try 218 & Try 219 but were omitted from the "best" map of Try 220 because they could not be uniquely placed: D2S39 (22), D2S36 (13), & GYPC (58).
Trying to position these seven loci somewhere in the 27 loci of the current map with one all run will NOT work. Most computers don't have enough memory to evaluate and store the >27 x109 34 locus maps this calculation calls for. On the other hand, trying only one locus at a time can sacrifice some analytical power, if the insertion of loci is mutually supporting. For example, we know from build220.out that loci D2S39 (22), D2S36 (13), & GYPC (58) don't have unique positions along the current map with a LOD score stringency of 2.000 . Trying to insert these three loci - one at a time - using all will simply repeat what we know from build220.out . A better strategy might be to attempt inserting these difficult loci in pairs, or in combination with one of the "new" loci from above. Rather than try the 21 possible two locus combinations possible with these seven loci, however (the all option is one of the more computationally intensive and this would take far too long), try each of the four new loci from the p & q arms. If one or more of these fits uniquely, twopoint will reveal which of the remaining loci - alone or in pairs - might next insert via all.

Exercise CRI-MAP 6.1: position loci on a map - locating new loci with all
Create four new chr22*.par files from chr222.par by adding one of 0, 9, 14, or 64 to the inserted_loci parametre. For each new file, copy it to chr2.par and save the output of "crimap 2 all" to all22*.out .

Add any locus to the current map if it meets the standard for inclusion, a unique position at a stringency of LOD score >= 2.000 .

Create chr227.par with NO loci in the ordered_loci parametre and the four new plus the three old loci as the inserted_loci. Copy chr227.par to chr2.par, and examine the output of a twopoint run. If one or more of the four new loci were added to the curent map in the previous step, list the loci showing linkage to these first. Next note any locus pairs showing linkage. Starting at the top of the list, choose two or three loci for further analysis; if you chose three loci, make three different locus pairs. Create the new chr22*.par file(s) from chr222.par, augmenting ordered_loci to reflect the current map, and adding the locus pair(s) to the inserted_loci. Try to insert the locus pair(s) using all. (NB: If you find and try to insert a trio of loci - all three loci at once - CRI-MAP will run for a day or two! Please run any jobs like this as batch jobs!)

Add any loci to the current map if they meet the standard for inclusion.

all223.out shows that locus CRYG1-5 (0) "fits" uniquely into the extant map, but the other all22*.out files show two or three possible positions for the three other loci. These others are maximally likely only at the ends of the map, and there is almost no preference for one end over the other. We can only use locus CRYG1-5 (0) to augment the current map.
In the test for twopoint linkage among the seven trial loci, only four loci show significant linkage relationships (2pt227.out). Luckily, one of these four is locus CRYG1-5, and it is linked to CPS1 (9) & D2S35 (14). In addition, D2S35 shows linkage to D2S39 (22).
Trying to insert these loci in three different pairs (9 14 - all228.out; 9 22 - all229.out; 14 22 - all230.out) proves unsuccessful, however. Perhaps first adding more q arm loci, via a third round of build runs would help; you are welcome to do the experiments!

With a new locus added to the current map, the remaining steps are:

  1. flips2 & flips3 - to check that the new map is in the best order (flips3 because the new locus has been added at one end), and
  2. fixed - to report the details of this final map.

Exercise CRI-MAP 6.2: position loci on a map - checking the order of the new map with flipsn; reporting any changes to the map with fixed
Using chr230.par as chr2.par, save the outputs of "crimap 2 flips2" and as "crimap 2 flips3" as flip2230.out and flip3230.out, respectively.

Modify the ordered_loci parametre of chr2.par if necessary, and save the output of "crimap 2 fixed" as fixed2.out.

The flipsn shows that no changes in map order are required! As before, though, there are a few locus pair reversals that are within a LOD score of 3.000 of this best map.
The final map of the p arm of chromosome 2 incorporates 28 loci (36 when haplotyped loci are included), spans 254.0 cM, and has a LOD score of -1141.012 .


7. Finding recombination events in the pedigrees

The map we've produced has strong support for most of its length; in reporting this map to colleagues, the reduced confidence we have in the order of 8 locus pairs could be indicated as follows:
         77          10       28       70
 0 55 68  1 49 20 69 33 59 26 40 31 66 56 54 50 44 65 42 5 
21 61                24             17
This conservative map has 20 backbone loci, plus eight others that probably go just before (if above) or just after (if below) their backbone neighbours.

Unsatisfied with reporting a conservative map? Another possibility is to go back to the dataset to see where its weaknesses lie, and to do another few experiments that will confirm or deny the current map. CRI-MAP's tool for viewing the data used for a particular map is the chrompic option.

Exercise CRI-MAP 7.1: Seeing the cross-overs - view the data used to produce the current map
Enter the command "crimap 2 chrompic", and save the output as chrpc230.out. (NB: this file is ~100 K!)

The chrompic option takes from the dataset all the information used to produce the current map, and re-formats it for easy inspection. As with most CRI-MAP options, the chrompic output begins with a restatement of the parametres that were used. It then lists map-specific data for each family in the dataset. For example, the first family in the dataset, Family 1326, has no data relevant to the current map. Each individual from #3 to #9 is shown with two lines of 36 "dashes" beside it, and the number 0 beside each line.

The lines represent the two copies of chromosome 2 in each person, the dashes represent the loci of the current map, and the number counts the cross-over events estimated to have occurred on that chromosome for the loci in their current order. There are 36 dashes because chrompic also reports on the loci in haplotyped systems.

The first family with relevant information is Family 1328.

Looking at individual #3, both of her chromosomes have been scored for seven of the 36 loci of the current map. Her maternal chromosome (above) has data for map loci 1 7 24 26 27 28 & 29 (these are loci CRYG1-5 D2S44 D2S6 D2S46 D2S48 APOB_2 & APOB) and her paternal chromosome (below) has data for map loci 7 24 26 27 29 33 & 36 (D2S44 D2S6 D2S46 D2S48 APOB D2S47 & ACP1). Further, individual #3 has one cross-over on the maternal chromosome, between map loci 1 & 7, which places six paternal (i) alleles for D2S44 D2S6 D2S46 D2S48 APOB_2 & APOB on the otherwise maternal (o) chromosome. Two cross-overs on the paternal chromosome place maternal alleles for D2S6 and D2S46. Note the precision with which these cross-over events are placed; quite precisely between map loci 26 and 27 for the second cross-over on the paternal chromosome, and with very little precision for the first cross-over - somewhere between map loci 7 and 24. Finally, note that all data from this family is phase unknown, indicated by the use of lower case letters (i and o) to show loci as either paternally or maternally inherited.

After the family data comes the summary of informative intervals. An interval, the gap between any two loci on the current map, is informative if one or more cross-overs are recorded within it. Thus, when we look at the "1____7" interval, we see it holds six cross-overs, one being from 1328-3-M (Family 1328, individual #3, maternal chromosome).

Finally, the chrompic output ends with the identities of the haplotyped loci in the current map, and the details of the Sex-Averaged Map.

The section of chrpc230.out that summarizes the cross-over chromosomes is the most useful for finding weaknesses in the dataset and seeing ways to improve them. One set of intervals with little support in the current map is the set among loci D2S44, Prot_C/pcr, D2S54+D2S54_2, & LCO, or, among map loci 7 through 11. (Recall that the locus order D2S44 D2S54+D2S54_2 Prot_C/pcr LCO has only slightly less support than the current map, a LOD score decrease of 0.76) Counting the cross-overs in each interval of this set gives a rough feeling for why.

      Locus IDs       68      77      1,3     49

      Map Loci       __7_______8______9,10____11__

      cross-over #     |---9---|---5---|---4---|
      per interval     |------11-------|
                               |-------8-------|
                       |----------10-----------|
There are almost as many cross-overs in interval "8____11" (8) as there are in total between "8__9,10" and "9,10__11" (4+5=9). Increasing the cross-over counts in interval "8__9,10" and "9,10__11" would strengthen the support for current map, as would decreasing the count in interval "8____11". A quick improvement could come from analysing the families with cross-overs in interval "8____11" for the one of the two map loci 9 & 10 (D2S54 & D2S54_2). Similarly, support for the current map would improve if there were fewer cross-overs in the "7____9,10" interval and more between loci 7 & 8 and between 8 & 9,10 . Which families should be re-analysed, and for what loci?

Exercise CRI-MAP 7.2: Seeing the cross-overs - evaluate the dataset used and plan its improvement
List the families that could be re-evaluated for inheritance patterns to improve the mapping of loci D2S44, Prot_C/pcr, D2S54+D2S54_2, & LCO. Specify the loci to be tested in each family and rank the families by likely informativeness.

Below is an excerpt from chrpc230.out . There are 15 families having the 29 cross-overs in one of the three intervals "7____9,10", "7____11" & "8____11".

Seven of these families have only one individual with a cross-over in a relevant interval, and six families have only two individuals; scoring entire families for new loci to scrutinise one or two cross-over events is unlikely to be an efficient use of resources.
Two families, 1359 & 1375, have ten individuals with relevant cross-overs. Family 1375 is only reported in the "8____11" interval, and so would require further analysis only at one locus, either D2S54 or D2S54_2. Family 1359 is reported once in the "7____9" interval and five times in interval "7____11"; this family needs rescoring for loci Prot_C/pcr & (D2S54 or D2S54_2). If forced to choose one of these two families for further analysis, I would choose Family 1375. Though it has fewer relevant cross-overs (4 vs. 6), it also has fewer individuals to score than 1359 (13 vs. 18), and for only one locus instead of two. Perhaps the most compelling reason to choose Family 1375 over 1359 is that much of its extant data is from phase known meioses, while Family 1359 has only phase unknown data.


8. An Evaluation of Our Map

A complete analysis of this dataset has been published in:
Spurr, N. et al 1992 The CEPH Consortium Linkage Map of Human Chromosome 2. GENOMICS 14:1055-1063

How does the partial map built in this tutorial compare? Here's our conservative map of the p arm again.

         77          10       28       70
 0 55 68  1 49 20 69 33 59 26 40 31 66 56 54 50 44 65 42 5 
21 61                24             17
And here's the 1992 CEPH Consortium map, reformatted for comparison.
               12
74 47 51 48  9  0 14 30 11 27 18 61 68  1 49 20 69 10 24 59 26 35 40 31 66 56 70 54 50 42 
               52 43                   16                75       15

The maps are almost identical for the order of the loci they share. The one difference is the locus pair 56 70, already flagged in our map as having weak support. As for the loci the maps do not share, many of the extra loci in the consortium map are q arm loci we chose to ignore (e.g., 74 47 51 etc.). Other extra loci close to or on p arm of the consortium map (18 16 75 35 15) are compensated by extra backbone loci in our conservative map (55 33 44 65 5) and by the other extra loci with weaker support (21 77 28 17).

I wish you good fortune with your use of CRI-MAP!



Comments? Questions? Accolades?
Please talk to your teacher   ( )
Updated on Wednesday, 22 January, 2005
Copyright © 1995-1996 by David W. Featherston. Updated for the MacOSX by Erik Bongcam-Rudloff