Research Student Monitoring Group

更新时间:2023-04-12 04:31:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Academic year:2004/2005

Report number:3

The University Of Birmingham

School of Computer Science

Research Student Monitoring Group

Thesis Proposal

Name of student:Arjun Chandra

Start date:27September2004

Registration Number:585242

Name of supervisor:Prof.Xin Yao

Thesis group members:Dr.Jonathan E.Rowe and Dr.Peter Ti?n o

Date of Report:17October2005

Working title:Co-evolutionary learning of Bargaining Behaviours for

Decentralised Control

1

Abstract

With the advent of the internet,enabling computational resources to be accessed remotely,not only has sharing resources become easy,it has allowed users to access these resources on a more cost e?ective basis,achieving a closer match between demand and supply.Such systems are said to involve multiple participants,each acting in an autonomous and self-interested manner,entering or leaving at will.Providing centralised control solutions for such systems has issues which can be remedied by having systems built around ideas from economics,leading to decentralisation.Market Based Control(MBC)systems are essentially built to exhibit certain properties of free-market economies viz.decentralisation,robustness and a capacity for self organisation. Actions of groups of inpiduals in free-market economies,engaging in simple trading interactions,essentially driven by self interest,results in the optimum allocation of resources or the commodities being traded(the market equilibrates).However,in order for an MBC system to be truly decentralised,agents must possess the ability to bargain.Incidently,real world bargaining problems are essentially those where agents have to come to an agreement over multiple issues or attributes of a commodity or resource.Issues having inter-dependencies,being closer to the real world,have been largely overlooked by the adaptive behaviour research community.This research mainly aims at modeling bargaining behaviours for such multi-issue bargaining situations and its extensions thereof,such that the methods synthesised may be extended and applied to real world decentralised control problems.The idea we are proposing to model bargaining behaviours is to use a multi-objective approach to formulate the bargaining problem and co-evolve behaviours of agents,tackling issues with multi-issue bargaining situations having inter-dependent issues together with implicitly addressing the problem of overspecialisation or generalisation in co-evolutionary learning as well. The outcome of this research will lead to the synthesis and analysis of this multi-objective co-evolutionary model,which may be applicable to real world decentralised control problems that call for agents to possess bargaining abilities.

2

Contents

Abstract2

List of Figures5 List of Tables6 1Introduction7

2Background/Literature Review8

2.1Market Based Control(MBC) (8)

2.2Previous and current MBC work (9)

2.3Bargaining:What?and Why? (10)

2.3.1What is Bargaining? (10)

2.3.2Issues or Objectives (11)

2.3.3Why Bargaining? (12)

2.4Bargaining Before Game Theory (13)

2.5Assumptions in Game Theory and the Need for Computational Approaches.14

2.6Game Theoretic Approaches to Bargaining (15)

2.6.1Cooperative/Axiomatic Approach (15)

2.6.2Non-cooperative/Strategic Approach (18)

2.7Computational Approaches to Bargaining (21)

2.8Project Focus (22)

2.8.1Multi-issue Bargaining and Co-evolution (22)

2.8.2Co-evolution (23)

2.8.3Co-evolution and Generalisation (23)

2.8.4Co-evolving Bargaining Behaviours (24)

3Issues with Multi-issue Bargaining:Problem Formulation and Solution Insights24

3.1Multi-issue Bargaining as a Multi-objective Optimisation Problem (26)

3.2An Ensemble as a Bargaining Strategy (28)

3

3.3Multi-objective Evolutionary Algorithms for Ensembles (28)

3.3.1Relevant Work Done Since Last Meeting (30)

4MOEAs for Co-evolution to Learn Bargaining Behaviours30

4.1Proposed Experimental Setup and Validation (31)

4.2Contribution and Evaluation (34)

5Plan/Timetable35

4

List of Figures

1Possible utility functions for risk averse and risk neutral bargainers (17)

2Concavities in pareto e?cient regions (27)

5

List of Tables

1Breakdown of work (42)

6

1Introduction

With the advent of the internet,enabling computational resources to be accessed remotely, not only has sharing resources become easy,it has allowed users to access these resources on a more cost e?ective basis,achieving a closer match between demand and supply.Such systems are said to involve multiple participants,each acting in an autonomous and self-interested manner(i.e.they may well have di?erent objectives or interests as the inpidual components or software representing participating entities may well have been designed by independent development teams which may not share the same interests as the others).Also, such systems are open in that participants may enter or leave at will.

Centralised control methods of software engineering have well known limitations in managing such systems.Decentralised control o?ers capabilities to recover from failure of inpidual components with no explicit need for reprogramming.Moreover,the absence of a central coordinating component leads to systems showing a progressive loss in performance (graceful degradation)rather than a sudden collapse occurring from failure of inpidual subsystems[31].Sometimes control problems are too large in order for the system to be controlled centrally essentially due to resource limitations.To come up with an optimum control plan in case of a resource allocation problem in open systems(where plans would need to be updated)can be very computationally expensive too for a centralised controller. Decentralised control helps remedy these problems.

Considering distributed systems as multi-agent systems comprised of self-interested interacting software entities(or agents)is a logical step forward.Free-market economies are similar large scale systems with interacting self interested and autonomous entities(humans). Herein lies the essence of Market Based Control(MBC)systems i.e.of creating arti?cial systems having free-market economy properties.MBC systems are essentially built to exhibit certain properties of free-market economies viz.decentralisation,robustness and a capacity for self organisation[31].Actions of groups of inpiduals in free-market economies,engaging in simple trading interactions,essentially driven by self interest,results in the optimum allocation of resources or the commodities being traded(the market equilibrates).

Much of the literature and applications of MBC,as noted by Cli?and Bruten[31], is?awed and lacks in some respect.These applications either involve a centralised auction process or rely on human intervention.Cli?[31]identi?es that in order to have an automatic decentralised system,there is a need for participants to possess bargaining behaviours which allow them to decide on what price to agree on for a transaction.According to our knowledge,little work has been done[44](having its own share of limitations)into looking at the bargaining problem directly and not within an already speci?ed mechanism.Thus, studying bargaining and modeling it looks like an attractive proposition,the idea being that every mechanism involves the need for agents to bargain and considering simple bargaining problems would possibly enable us to extend the solutions developed to complex real world market settings.

Traditionally,bargaining problems have involved parties trying to reach an agreement over a single issue e.g.the pision of a surplus(price of a commodity for instance).Often though,inpiduals have several goals which they want to achieve motivated by self-interest. This,in economic literature,is termed as inpiduals having multiple issues to deal with.

7

However,multi-issue bargaining problems considered in the adaptive behaviour literature so far have mostly abstracted away some important real world considerations.Issues,in most real world situations,are supposed to be inter-dependent and there has hardly been any work done in this direction.We propose an idea for taking such bargaining situations into account and model bargaining behaviours using a multi-objective co-evolutionary approach.Multi-objective optimisation algorithms are a natural way to model such bargaining behaviours as they not only allows us to search the space of possible agreements(that agents may make)more thoroughly but also enable an agent to delay its decision on agreeing on more e?cient outcomes by providing it with a choice of agreements.Additionally,multi-objective algorithms result in a set of solutions instead of just one and this gives us the opportunity to make agents more robust to sudden changes in the bargaining situation by using the set of solutions as an ensemble and tackling opponents using strategies from within it.

Apart from modeling behaviours,using a co-evolutionary approach has its own set of problems too which is the second focus of the research.The multi-objective approach may however have an implicit solution to the problem of overspecialisation and this will also be studied.The research will essentially look into modeling more realistic bargaining situations such that they may be applied to real world decentralised control problems.

To begin with,we present an overview of market based control systems.A detailed overview of game theoretic literature related to bargaining,as a problem,with various approaches to its solution,is then considered.This is followed by a discussion of the issues or problems we want to tackle in this research,as far as bargaining is concerned, and insights on a possible formulation of the bargaining problem as well as the solution. We then give a critique on why the proposed ideas may lead to plausible solutions to the problems being tackled and discuss possible experimental and validation setups.We end with a brief commentary on the expected contributions from the research.

2Background/Literature Review

2.1Market Based Control(MBC)

Markets1and Arti?cial Intelligence(more speci?cally,adaptive behaviour)research have a dual relationship.Cli?[29]argues that human economic interactions are,after all,o?shoots of human intelligence and adaptability,as much as are spatiotemporal behaviour patterns (which are common to many animal species)such as obstacle avoidance,path/wall following and the likes.Although exclusive to humans,bargaining or negotiating and trading within markets could well be considered as adaptive behaviour.Modeling this behaviour can explain how such phenomenon come about in actual markets which could lead to development of prediction models at the policy/mechanism(e.g.market organisation)level.Apart from this, the second and more relevant relationship between the two paradigms entails the use of ideas from economics as candidates for developing solutions for distributed resource allocation and control problems[22,29,50].For instance,a resource can be thought of as a commodity 1A set of rules using which a set of traders(buyers and sellers)interact in order to exchange goods and services(in accordance with the de?nition in[7]).

8

and a currency can be provided which allows agents2to buy or sell the commodity.

In real free markets,traders interact without any central control and this remarkably leads to the price of the commodity being determined i.e.resource allocation occurs or emerges from local interactions of traders.In simpler terms,supplying a commodity in relation to its demand,being a fundamental microeconomic activity,can be considered as a resource allocation problem.Agents have demands for resources and agent interactions lead to their demands being met(when supply equals demand or in economic terms,the market clears or equilibrates[3,29,88]).If the price of a commodity is high,there will be excess supply and buyers will chose cheap o?ers giving sellers the incentive to lower the price. Alternatively,if prices are low,suppliers can chose from amongst buyers with higher bids, giving buyers an incentive to bid higher.At some stage during this negotiation process,the price of the commodity under question reaches a point at which no trader has an incentive to deviate from their o?ered price,thus stabilising the system.A remarkable result here is that inpidual self interest(sellers maximising pro?ts and buyers minimising costs)gives rise to price determination and the market clears(resources are allocated or supply matches demand).

In most markets,prices are determined by a particular style of auction mechanism (rules of the game)i.e.how agents interact and bid for the resource.Agents possess only partial knowledge[49]and studies in experimental economics[80]try to explain the dynamics by which price determination occurs.Markets in such simulated environments do not assume perfect competition(and are more like oligopolies3)but Smith[90]shows that price determination still occurs.In other words,small number of traders and imperfect knowledge about supply and demand still results in a rapid approach to a pareto e?cient4 allocation.Hence,market methods can be utilised for decentralised resource allocation and control.

2.2Previous and current MBC work

Much of the market based control work is documented in[22]where application of market methods in a number of perse areas are considered.These include allocation of network bandwidth problems,memory allocation problems in operating systems,control of air condi-tioning ventilation and temperature within o?ces in buildings,job-shop scheduling and the likes.Cli?and Bruten[31]examine some of these techniques and argue that much of this work is?awed.The main reason given is that although free markets are decentralised and self-organising,much of the methods explicated in[22]rely on centralised control processes and these in turn relieve agents from possessing bargaining behaviours.Bargaining is an essential adaptive behaviour for decentralised control[29]and methods such as reinforcement learning and arti?cial evolution can be candidates for modeling such behaviour and so there 2An agent is any entity involved in trading a commodity.More precisely,if we view a market as a game, agents or traders are players.

3Behaviour of a trader is highly dependent on the behaviours of others.

4According to[29],equilibration in markets is a result of price competition between agents and gives e?cient/optimal allocation of resources in a decentralised manner.An allocation is pareto e?cient if there is no other allocation which makes one agent better o?and does not make the other worse o?at the same time.It is the best allocation for all agents together.

9

is a need for the adaptive behaviour community to look into decentralised control problems. Lack of research in this area is expressed very clearly in[29].However,a good amount of work in market based control with agents possessing bargaining behaviours has been conducted since then(see[28,31,30,23,25,32,24,26,27]for details).In essence,this work considers using GAs to evolve certain parameter sets which are supposed to encode strategy sets(bidding behaviour sets)for traders.These parameter sets are evolved and the resulting strategies tested within a CDA5environment using Smith’sαmeasure(also called‘coe?cient of convergence’)[90]as the evaluation function.The work also focuses on evolving both,the trading strategies as well as the auction type(mainly encoded such that variants of CDA emerge).

Another body of work uses a co-evolutionary approach to evolving trading strategies [73].A separate population for each agent is considered with the view that agent strategies would be constructed from scratch as opposed to Cli?’s work[25]where the parameters for a certain type of bidding behaviour(a single type of strategy in the sense of the work done by Phelps et al.[73])are tuned.Also,the?tness of any inpidual or agent depends on that of the others and so a co-evolutionary approach would lead to spiralling of an arms race wherein each population will goad on the others to advance in their trading capabilities.Moreover, co-evolution models adaptive behaviour letting agents adapt to the environment which is de?ned by other agents and the way they interact.As previously mentioned,bargaining is an adaptive behaviour and so evolution attempts to model this at multiple levels(i.e.it has the ability to generate strategies with bargaining behaviours and tune these behaviours therein as well).This work is further extended in[72]where apart from the trading strategies,the mechanism(restricting the space of possible mechanisms to those represented by transaction pricing rules which can actually be parameterised)is also evolved.Genetic programming forms the main evolutionary computation paradigm used in that the strategies as well as mechanisms are represented by trees using function sets as given in[72,73].Phelps et al.

[74]further consider evolving pricing rules under a multi-objective setup,evaluating these rules by simulating agents equipped with a simple reinforcement learning algorithm(also called Roth Erev learning algorithm[81])giving values for the objectives and taking a linear sum of these objectives.Both,co-evolutionary[72,73]and multi-objective[74]approaches were found promising.

2.3Bargaining:What?and Why?

2.3.1What is Bargaining?

Bargaining,as the name suggests,is a socioeconomic problem in which two inpiduals meet and cooperate towards the creation of a commonly desired surplus,where the actual distribution of the surplus puts them in con?ict[87].

Suppose an inpidual wants to sell a commodity and another wants to buy it.Both have a common interest in the transfer of this commodity.Since there can be many prices at which the commodity can be sold,where the seller prefers a higher price and the buyer lower, con?ict may arise.Note here that the monetary di?erence in what the seller wants and what 5Continuous Double Auction

10

the buyer wants to pay is termed as surplus.So,if the two do not cooperate,there would be no commodity transfer and hence,no surplus.In order to resolve the potential deadlock so as not to lose out on the joint gains(commodity at a lower price for the buyer and a good price for the commodity for the seller)that come about as a result of the commodity transfer,they have to?nd some way to reach an agreement on the price.Other social, economic and political scenarios which?t the de?nition are:a couple deciding on how to split intra-household chores,negotiations on a labour contract between a?rm and a union, two unfriendly nations trying to reach a lasting peace agreement[87].On a more interesting note and of more relevance to this research is a parallel scenario,that of computational agents deciding on how much of a certain resource they need.More on this theme will be elaborated on shortly.

According to Serrano[87],a bargaining problem essentially has three basic elements: an arrangement that is expected to take place when the involved parties are unable to reach an agreement(also called the disagreement point or the status-quo situation);the existence of mutual gains from cooperation;and the multitude of possible cooperative arrangements in case of an agreement leading to a split of the surplus.

2.3.2Issues or Objectives

Traditionally,bargaining problems have involved parties trying to reach an agreement over a single issue e.g.the pision of a surplus(price of a commodity for instance).Such bargaining situations are termed as distributive[5]in that a gain for one player always creates a loss for the other.

The concept of utility is generally used to quantify the preferences of inpiduals[9,95]. It is a function mapping the preferences that inpiduals have over possible outcomes into real numbers(or as Nash[65]states,the utility function maps each anticipation of an inpidual into real numbers,where anticipation is essentially the outcome of an event).This function is generally subjective in nature and the utility of one inpidual cannot be compared with another[44].However,in many real-world applications,a monetary value is associated with the utility making comparisons possible.This utility can represent the anticipation of an inpidual over an issue such that we are able to quantify the space of possible outcomes that the issue represents.For instance,if price of a commodity is the issue at hand,a utility function will simply depict the value an inpidual associates with a certain price within the range of prices available.

From the previous discussion,it can be seen that an issue can essentially be treated as an objective which a party tries to optimise on,thus maximising their utility(a buyer would minimise the cost while a seller would try to maximise it,however,both are maximising their utility by doing so).It is easy to view the concept of utility when dealing with just one issue.Often though,inpiduals have several goals which they want to achieve.This in economic literature is termed as inpiduals having multiple issues to deal with.

A very important point with multi-issue bargaining situations is that parties have preference relations between issues such that when dealing with another party,a mutually bene?cial outcome can be agreed upon by exploring the trade-o?s that the issues present. As opposed to distributive bargaining,it becomes possible for one side to gain without the

11

other side getting less.Such negotiation scenarios are termed integrative[5].Moreover,due to the existence of trade-o?agreements,multi-issue bargaining is less competitive as opposed to single issue scenarios which are often also termed as competitive negotiations[46].

The usual way to represent utility in multi-issue cases is to use a multi-attribute utility function[76,44].A multi-attribute utility function de?nes utility over multiple weighted attributes(or issues)where the weights indicate preferences or the relative importance one gives to the issues.The utility is thus calculated by multiplying each attribute(outcome on any particular issue)with the weight associated with it and adding across all the issues.

A multi-attribute utility function can be used to represent preferences over several goals, however,when the goals are inter-dependent,it may not be the best way to achieve trade-o?agreements.Issues with this utility function will be considered shortly and through these stem the main research questions which I am proposing to tackle.

2.3.3Why Bargaining?

As mentioned previously,MBC systems are essentially built to exhibit certain properties of free-market economies viz.decentralisation,robustness and a capacity for self organisation [31].

With the advent of the internet,enabling computational resources to be accessed remotely,not only has sharing resources become easy,it has allowed users to access these resources on a more cost e?ective basis,achieving a closer match between demand and supply. Networked resources like specialised laboratory equipment and processing power may now be shared by users at multiple sites.Few examples of such networked resource initiatives are:GRID networks for scienti?c communities,utility and on-demand computing initiatives by HP and IBM respectively.

Such systems are said to involve multiple participants,each acting in an autonomous and self-interested manner(i.e.they may well have di?erent objectives or interests as the inpidual components or software representing participating entities may well have been designed by independent development teams which may not share the same interests as the others).Also,such systems are open in that participants may enter or leave at will.

Centralised control methods of software engineering have well known limitations in managing such systems.Decentralised control o?ers capabilities to recover from failure of inpidual components with no explicit need for reprogramming.Moreover,the absence of a central coordinating component leads to systems showing a progressive loss in performance (graceful degradation)rather than a sudden collapse occurring from failure of inpidual subsystems[31].Sometimes control problems are too large in order for the system to be controlled centrally essentially due to resource limitations.To come up with an optimum control plan in case of a resource allocation problem in open systems(where plans would need to be updated)can be very computationally expensive too for a centralised controller. Decentralised control helps remedy these problems.

Considering distributed systems as multi-agent systems comprised of self-interested interacting software entities(or agents)is a logical step forward.Free-market economies are similar large scale systems with interacting self interested and autonomous entities

12

(humans).Herein lies the essence of MBC systems i.e.of creating arti?cial systems having free-market economy properties.Actions of groups of inpiduals in free-market economies, engaging in simple trading interactions,essentially driven by self interest,results in the optimum allocation of resources(the market equilibrates).As mentioned in[31]allocation coming about in a decentralised manner is the crucial point to consider.Such allocations emerge from the interactions between traders and there is no central control process involved. Decentralisation and self organisation are strong motivational factors for MBC systems[31].

Markets can have various structures too,some involving an auctioneer without whom the market cannot operate whereas some have no central authority which is the case with real markets.CDA is one such decentralised and asynchronous market structure and is considered to be very e?cient.Much of the literature and applications of MBC,as noted by Cli?and Bruten[31],is?awed and lacks in some respect.These applications either involve a centralised auction process or rely on human intervention.Cli?[31]identi?es that in order to have an automatic decentralised system,there is a need for participants to possess bargaining behaviours which allow them to decide on what price to agree on for a transaction.According to our knowledge,little work has been done[44](having its own share of limitations which will be elaborated on later)into looking at the bargaining problem directly and not within an already speci?ed mechanism(CDA by Cli?[28,31,30,23,25,32,24,26,27]).Thus, studying bargaining and modeling it looks like an attractive proposition.This will enable us to design agents which know how to trade.Moreover,every mechanism involves the need for agents to bargain and considering simple bargaining problems would possibly enable us to extend the solutions developed to complex real world market settings.

2.4Bargaining Before Game Theory

Bargaining problems were initially called bilateral monopolies and were deemed indetermi-nate[39]in orthodox economics[47].Von Neumann and Morgenstern[95]argued that the most one can say about the problem was that the result(or solution)will lie in the bargaining set.

If a solution is in the bargaining set then it is said to be both inpidually and collectively rational.Inpidual rationality means that neither party should end up with pay-o?s worse than what they get if an agreement is not reached(i.e.worse than the status-quo).Collective rationality refers to the concept of pareto e?ciency.An outcome is pareto e?cient if no outcome exists which is strictly preferred by one player and not less preferred by any other.Deviating from this outcome would make one player better o?at the expense of the other.Pareto e?cient outcomes are also referred to as outcomes lying on the contract curve[39]or the pareto e?cient frontier.

According to Harsanyi[47],Zeuthen attempted to provide a more determinate pre-diction(antedating the theory of games[95])where the solution to the bargaining problem could be dictated by the parties’attitudes towards risk of a breakdown.If a party’s readiness to risk a con?ict(also termed as determination)is greater than the other,then the other party tends to make a concession.The process involves each party making a concession until no more concessions are possible due to the inpisibility of the monetary unit setting a lower limit on the size of the admissible concessions.In the symmetric case(where the

13

determination of each party is equal),an equal pision of the bargaining surplus is obtained

(a determinate solution!).

2.5Assumptions in Game Theory and the Need for Computa-

tional Approaches

To make mathematical analysis possible,game theory often makes simplifying assumptions. Two of the most common assumptions made being players having complete information and them being perfectly rational.

Complete information refers to the fact that every player of the game knows the strategies and payo?s available to the other players.The players may not however have knowledge inside the game.If they do then the game is thought of being that of perfect information.For instance,in the case of prisoner’s dilemma,complete information about the strategies and the payo?s available to the players may be known but the players do not necessarily know the moves of the opponent in any particular round of the game(or in the ?rst round,depending on it being a one shot or a repeated game).This lack of information or uncertainty inside the game makes it a game of imperfect information.On the contrary, a sequential game where,at each decision point,the choices that have previously been made are known is said to be that of perfect information.

A game is that of incomplete information if something about the circumstances in which the game is being played is not known to the players[44].For instance,the utility functions(or preferences)of the players(in the bargaining context)may not be known.In such game however,the players may be forced to consider an in?nite hierarchy of beliefs6 [8,48].Incomplete information of other player’s preferences and beliefs is modeled by game theorists by specifying a limited number of player types[44,48].Types essentially determine the preferences and beliefs uniquely.Players are not certain about the opponent’s type but the probability that the opponent is of a certain type is common knowledge7.The game is then said to be transformed into that of imperfect information.Imperfect information games are useful for studying phenomenon like reputation building[84].

The assumption of perfect rationality stems from the fact that there is a need for common knowledge on how players reason.

These assumptions limit the practical applicability of game theoretic solutions.More-over,real life trading situations do not assume complete information and perfect rationality. Both humans and computational agents have limited information and forward looking ca-pabilities.As stated in[44],many tasks are learnt through a process of trial and error(i.e. through experience).Additionally,as mentioned previously,agents may be programmed by di?erent parties implying them having di?erent capabilities.Assuming perfect rationality about their behaviours will thus be erroneous.

If an agent has little a-priori knowledge about the environment and it gradually adapts in the quest for an optimal solution(by interacting with its environment through a process 6Beliefs are probabilities of events happening about which the player is uncertain.

7Knowing information about players and the players knowing that the information is known and knowing that the players know that the information is known ad in?nitum,describes the notion of common knowledge.

14

of trial and error),it can be said to be boundedly rational[85].

In a computational scenario with intelligent agents,the limiting game theoretic assump-tions of full rationality and complete information are unnecessary and not required.This is because the behaviour of agents can be modeled directly[44].Agents can use machine learning techniques to improve on their strategies(after being put into the environment with a certain strategy).After a period of learning,the agents may be able to exhibit behaviours resembling rational and fully informed agents.This has been shown in[44] where evolutionary algorithms were used to model bargaining behaviours for agents.As mentioned in[44],rational behaviour often emerges.

Considering the above and since there is a need for agents possessing bargaining behaviours(a case for which was presented in Section2.3.3),computational techniques are thus a positive way forward and may have two fold advantages:the ability to be used in environments requiring the need for decentralised control and the ability to analyse game theoretic settings which are too complex to be analysed analytically.

2.6Game Theoretic Approaches to Bargaining

There are essentially two branches of bargaining theory and game theory in general:coop-erative or axiomatic and non-cooperative or strategic.

In cooperative game theory,pre-play communication is allowed and the players can co-operate into reaching binding agreements.Given a set of feasible outcomes(the bargaining set)which are the outcomes that can be jointly achieved by players concerned,?nding a solution from within this set without calling into question how the players reach it,is the main idea behind the cooperative approach.A number of properties(or axioms or assumptions)that the solution to the bargaining problem should have are proposed and the outcome which best agrees with these properties is chosen as the solution.This theory was born with Nash[65].The theory answers the question of how bargaining should be resolved between rational parties in accordance with some desirable principles[87].

The non-cooperative approach,initiated by Nash[67,68]relies on the exact speci?ca-tion of the situation under study(bargaining or negotiation in the current context)as games and the identi?cation of the behaviours occurring in these games.Here,exact speci?cation signi?es speci?cation of the protocol using which the players interact,the strategies the players can use,the payo?s they can get,the information available etc.This approach to game theoretic problems and bargaining in particular describes how bargaining may evolve in the presence of common knowledge of rationality[87].

An overview of the bargaining literature from these two branches of game theory is considered next.

2.6.1Cooperative/Axiomatic Approach

Nash[65]was the?rst to contribute to the axiomatic theory of bargaining.A bargaining problem(more speci?cally,a Nash bargaining problem)is represented by a pair(S,d)in

15

utility space where S is a closed,bounded above and convex subset of R2and is the set of feasible8utility9pairs and d is the point in S representing the payo?s that players get in case of a disagreement.Accordingly,a solution(also referred to as a solution concept)to the bargaining problem is a function that maps a feasible set of utility pairs to one of its feasible points.Note that the nature of S dictates the validity of a solution concept.

An example of a solution concept is the disagreement solution[87]which assigns the point d to the bargaining problem and is rather pessimistic(as evident).This solution is not pareto e?cient as it does not exploit the gains from reaching a cooperative agreement. The dictatorial solution[55],where one bargainer acts as a dictator,assigns the point in the bargaining set where the other bargainer receives zero utility as the solution to the problem. As can be seen,this solution is obviously unfair.

Nash[65]proposed four desirable properties that a bargaining solution should possess:

1.The?nal outcome does not depend on how the utility scales are calibrated as di?erent

utility functions can be used to model the same preferences.

2.The agreed payo?pair should be in the bargaining set(inpidually and collectively

rational).

3.Certain utility pairs are irrelevant in that if players agree on a pair s when t is also

feasible,then t is never agreed on when s is feasible.The decision on choosing s does not depend on t whether or not t is present in the feasible region.This axiom is also called independence of irrelevant alternatives(IIA).

4.Both players get the same in symmetric situations(if the player’s labels are reversed,

each will still receive the same payo?).

The solution proposed by Nash[65],satisfying the four properties above,is the payo?pair s=(u1,u2)which maximises the Nash product(u1?d1)(u2?d2),where d=(d1,d2) and this solution is unique[65].If one relaxes the symmetry axiom,the bargaining powers (does not mean bargaining skills due to the assumption of perfect rationality,but possibly, strength due to market positions for instance)of players dictates the solution(also called the generalised Nash bargaining solution[9]in this case).It is the payo?pair s=(u1,u2) which maximises the product(u1?d1)α(u2?d2)β,whereαandβare the bargaining powers of the players.

Attitudes of bargainers towards the risk of a breakdown or disagreement also e?ects the payo?that they receive if one follows Nash’s solution concept.Consider the utility functions in Figure1.These are of the form u(s)=sα,where0<α≤1and the smaller theα,the more risk averse the bargainer is(risk averse player1is more risk averse than risk averse player2in the?gure).In the symmetric case(when the bargaining powers are equal),a more risk averse player gets a lower portion of the surplus,as shown in[9,87].A risk neutral bargainer(α=1)will obviously get a higher share of the surplus when confronting either of the other two players shown in the?gure.

8A point is feasible if it is possible to select it.

9Utility can also be seen as the payo?a party gets.

16

amount of surplus (s)u t i l i t y (u )Figure 1:Possible utility functions for risk averse and risk neutral bargainers.

Nash’s IIA axiom was the subject of severe criticism [61].As mentioned in [87],Nash’s solution does not consider global issues (in terms of the bargaining situation at hand)such as the highest utility each bargainer can obtain.Kalai and Smorodinsky [53]proposed an alternative solution to the Nash’s bargaining solution.They de?ne the notion of having a utopia point .If a i (S )be the highest utility player i can achieve (also called aspiration level)then the utopia point is the point in R 2(typically not feasible)denoting the aspiration levels of each player (a 1(S ),a 2(S )).The Kalai-Smorodinsky solution selects the maximum element on the line that joins the disagreement point (d 1,d 2)with the utopia point (a 1(S ),a 2(S ))and in the bargaining set.Other solution concepts replacing the IIA axiom can be found in

[78,91].

Another solution concept is that of egalitarianism [64].An egalitarian solution is a point on the pareto e?cient frontier where utilities for the players are equal.This solution is more tied to ethical behaviour than to principles governing bargaining between rational inpiduals.Also,if a bargaining solution maximises the sum of the utilities,it is said to be a utilitarian solution[64].Due to interpersonal comparisons of utilities being possible in both egalitarian and utilitarian solution concepts,Nash’s axiom of solutions being independent of utility calibrations does not hold.Cooperative theories of bargaining are discussed in greater detail in [79].

The above mentioned solution concepts were initially developed for single issue (see Section 2.3.2)bargaining problems.In the case where multiple issues are involved,these concepts can still be applied provided we can transform the combined utility space (each issue occupying one dimension in the space)into a single issue utility space.This can be done by utilising a multi-attribute utility function (as de?ned in Section 2.3.2).The relationships or trade-o?s between issues are assumed given in this case and are the weights assigned to each issue.The weighted utilities for each issue are then added together.This utility mapping is appropriate only if the issues are independent (i.e.contribution of one issue is independent of the values of the others).Once the preferences of the bargainers are mapped onto the multi-attribute utility function,the problem reduces to that of a single

17

issue.Examples of such problems can be found in[76].More on inter-dependence between issues will be considered in Section3.

2.6.2Non-cooperative/Strategic Approach

As mentioned previously,details of the negotiation process are speci?ed in this case.There are various protocols(which may be called games)that researchers have used to specify the bargaining process.The idea is that bargainers will use these protocols while deciding on the pision of the surplus and some behaviour will emerge.The goal is to study procedures that are reasonable and identify rational behaviour within them.To determine rational outcomes of a game,the concept of equilibrium is used in this approach.The two most widely used equilibrium concepts are Nash equilibrium[66]and Subgame perfect equilibrium[86]:

Nash equilibrium.If no player can bene?t by unilaterally changing his strategy,then the strategies chosen by all players are said to be in a Nash equilibrium.Every?nite game has at least one such equilibrium point[66,67].

A game is said to be one of extensive-form if it can be represented by a tree and the players can make decisions sequentially at various stages of the game(at nodes of the tree which are also called decision points).

Subgame perfect equilibrium.Considering an extensive form game,the strategies are said to be in subgame perfect equilibrium if they constitute a Nash equilibrium at every decision point.

We now consider some games studied in strategic bargaining literature.

The?rst bargaining model to be considered as a non-cooperative game was introduced by Nash[68].This is called the Nash demand game.The idea is that two players simulta-neously demand a utility level or a part of the surplus.There is no knowledge of the other player’s demand.If the sum of the demands exceeds the surplus,the players receive their disagreement payo?otherwise they get their respective parts.This game admits an in?nite number of Nash equilibria.The outcomes on the pareto-e?cient frontier(i.e.collectively rational payo?pairs)are all equilibrium points.Moreover,the disagreement point is also a Nash equilibrium.Suppose the players chose a point on the pareto-frontier.It can be easily noted that any player would only decrease their payo?by moving away from the frontier. Also,if both demand more than the entire surplus,they will get the disagreement payo?even if they unilaterally want change their strategy.

Another game,considered less competitive and more generous from the point of view of throwing away of the surplus(which may easily happen in the Nash demand game),is the Ultimatum game[9].It provides a model for the simplest of all bargaining procedures and is realistic in the sense,it is the bargaining procedure normally employed by us in stores[9]. This is a two stage game in which the second stage is similar to the Nash demand game. One player proposes a split of the surplus and the other player has to decide on whether to accept or reject the o?er.In case the o?er is rejected,both players either get nothing or the Nash demand game is played.The usual de?nition says that the game ends in case the responder rejects the o?er[9].This game has a subgame(?rst stage on)and has a subgame perfect equilibrium in the event where one player demands the whole surplus and the second

18

accepts this deal.The game again admits a continuum of Nash equilibrium outcomes(the pareto e?cient outcomes).

One of the most elegant protocols[44]extends the aforementioned ultimatum game to multiple stages.It models negotiations as they take place over time and is referred to as the alternating o?ers game.As mentioned in[44]and[87],work on this bargaining model has been pioneered by Stahl.The game can be characterised as being that of either a?nite horizon(or?nite number of alternatives/rounds/periods[44])or an in?nite horizon(studied by Rubinstein[83]).Assuming the surplus is of size1,in period0,Player1starts by making a proposal(a pision of the surplus,proposing x for himself).Player2can either accept or reject the proposal.If he accepts,Player1gets x and Player2,1?x.If he rejects,they meet again at the negotiation table after a period of time and then(period1)the roles of the players switch.Now Player2makes a counter o?er,which Player1then accepts or rejects (sending the game to the next round).In the?nite horizon game,there is a?nite number of rounds i.e.there is a deadline.If an agreement has not been reached until the deadline, both players receive zero payo?.In the in?nite horizon game there is always a new proposal in the follow up periods until an agreement is reached.Starting at the last stage of the game and working backwards inductively leads to the identi?cation of optimal strategies for rational players with perfect information.This approach is similar to dynamic programming and is also the idea behind?nding equilibrium strategies which are optimal responses at every point in the game and not just at the beginning of the game for all players concerned. The strategies thus obtained are in subgame perfect equilibrium(SPE).To predict optimal outcomes in dynamic games such as the alternating o?ers game,one often comes across the term discount factor.A discount factor models the utility of a future outcome as present valuations as expectations.In other words,a player values future events but not as much as present events,but not because he does not appreciate the future event happening,instead, because he may be unsure of them happening.In e?ect,a discount factor models how impatient a player is.

To give an idea of how this backward induction mechanism and discount factors work, lets consider a two stage alternating o?ers game.Suppose the surplus size is1and the utility functions are linear.We know that SPE in an ultimatum game is(1,0),where the ?rst player gets the whole of the surplus leaving the second with nothing.Suppose,in the two stage game,a node is reached where Player2has to make an o?er.This stage on is a subgame of the actual game and can be seen as an ultimatum game by Player2.In this case, Player2will play his SPE strategy and o?er(0,1),i.e.,full surplus for himself.The expected payo?at this node is dictated by the discount factors of both players(sayδ1andδ2).The expected utilities will thus be(0.δ1,1.δ2)=(0,δ2).Now,if we traverse backwards into the previous stage of the game,Player1’s optimal move,with what is expected in the next stage known,would be to make an o?er which makes Player2indi?erent between whether to accept or reject.This o?er is(1?δ2,δ2).In a way,Player1convinces Player2not to take the entire surplus(in case the negotiations go to the second stage),by o?ering him the present discounted value of the entire surplus.Note that if Player1o?ers anything lower thanδ2to Player2,he will end up with nothing as Player2will reject the o?er and play his equilibrium strategy with the counter o?er of(0,δ2).Also,rationality will make Player 1not to o?er Player2anything more thanδ2.The o?er(1?δ2,δ2),as it will result inδ2 for Player2,will be accepted by him,as by doing so,he will be playing his rational move. The strategies resulting in the outcome(1?δ2,δ2)are in subgame perfect equilibrium as no

19

player will bene?t from unilaterally changing his strategy at any stage of the game i.e.,the strategy is a Nash equilibrium of every subgame of the two stage game.We can also see that if Player2is very impatient,δ2will be nearly zero and in this case,Player1will get nearly all the surplus.On the other hand,ifδ2is nearly one,Player2will get the major share. Also,the agreement takes place immediately.Such a backward induction process is usually used to determine the outcome of rational play in dynamic games such as this.Similar logic applies and continues to any number of rounds.

Rubinstein[83]demonstrated that Nash equilibrium does not restrict the outcomes i.e.,a unique solution to the alternating o?ers problem cannot be found using the Nash equilibrium concept.In fact,any agreement(x,1?x),in any period t of the game where 0≤t<∞,can be supported as a Nash equilibrium outcome[83].However,the concept of SPE was applied by Rubinstein[83]showing that there exists a unique SPE in the alternating o?ers bargaining model speci?ed by the partitioning where Player1gets(1?δ2)/(1?δ1δ2) (δ1andδ2being?xed discounting factors of the players)and Player2gets the remainder of the surplus immediately(in the?rst round of the game).The time period between two stages of the game is usually not taken to be unity.Ifτis the period between two stages, thenδi is replaced byδτi everywhere.An important point to make here is that for small time intervals between rounds i.e.,ifτ→0?δ→1,this unique SPE outcome approximates the generalised Nash bargaining solution.This important result was convincingly proven by Binmore et al.[10].The limiting case ofτ→0is signi?cant in that,in real world situations, the optimal thing to do for a negotiator,once he rejects an o?er,is to make a counter o?er as soon as possible without sticking to a strict timetable[9].This result essentially established a link between cooperative and non-cooperative game theory and is called the Nash Program [9].

The Rubinstein-Stahl alternating o?ers game,as this alternating o?ers protocol is called,has had many applications in real world situations e.g.bargaining problems concern-ing international trade,industrial organisation,political economy etc.[87].More on in?nite horizon games and its applications to decentralised trading situations is given in[71].This game is what I am considering exploring as part of my PhD work.

Yet another protocol studied in the non-cooperative bargaining literature is the mono-tonic concession protocol[77].Two players simultaneously announce their proposals and an agreement is reached if both o?ers match or the o?er exceeds the other player’s demand. Tossing a coin decides on which one of the two o?ers to chose from,in case they are dissimilar. In case of a disagreement,the players make new o?ers which need to be higher in utility for the other player than the utility he(the other player)would get from the previous o?er, implying monotonic concession.The player can either make the same o?er in which case he is said to be standing?rm,or could concede.If both players stand?rm,negotiations end and both receive a disagreement payo?.More on this protocol can be found in[77].

According to[75],in the multi-issue case,the protocols under which bargaining may take place,can be pided into global or simultaneous,separate and sequential types. As the name suggests,global bargaining calls for having to bargain over all the issues simultaneously while separate bargaining allows for negotiations to take place over issues separately and independently.The third case involves negotiations taking place over issues in a sequential manner.Sequential bargaining can further be pided into independent and simultaneous,depending on whether or not players can bene?t from an agreement over

20

an issue.In the independent implementation,an agreement over an issue excludes them from being considered in future bargaining rounds and the agreed upon issues are no longer discounted.On the contrary,the simultaneous version makes players wait until all issues have been considered before bene?ts from them can be enjoyed.As may be evident,if one negotiates over issues in a sequential manner,the question of which issues one should consider bargaining over?rst arises.There are two main bodies of work dealing with this issue.The order in which di?erent issues are brought on to the negotiation table(also called agenda) can be set either exogenously or endogenously by the players.In the former case,since each issue is bargained over one at a time,Rubinstein’s results of uniqueness of the equilibrium and that of it being pareto e?cient hold[87].Fershtman[40,41]studies exogenous agenda games in the case where the players attach di?erent importance to di?erent 2b23cb6548d7c1c708a1454fually however,either the importance associated with issues is equal(for instance,[4])or the players have identical preferences[14](i.e.a player attaches the same importance to an issue as the other).More realistic games with players having to endogenously decide on the agenda have been studied in[52,51].In[51],o?ers can be made in any subset of remaining issues and a unique subgame perfect equilibrium agreement is shown to have been obtained.A case in point to make here is that the reason having a unique subgame perfect equilibrium for a game is important is that such games become easier to have as validation benchmarks i.e., it becomes easier to test new approaches(from outside game theory)to solutions to these games by comparing the results with game theoretic ones(i.e with equilibrium outcomes).

2.7Computational Approaches to Bargaining

Modeling human economic behaviour using computational approaches helps get rid of the notion of having to make simplifying assumptions of rationality and common knowledge used frequently in game theoretic analysis.Techniques from AI have been shown to model such behaviours to good e?ect and rational behaviours have been shown to emerge.

The idea is to make the agents interact with each other according to some protocol and, in time,learn from these interactions.An agent can learn the bargaining strategy specifying the actions it should take during the course of play while adapting the strategies during play. In other words,an agent can learn and see how to modify its strategy based on previously played bargaining games and try to amend previous mistakes as necessary while at the same time modify its strategy on the?y(while in a game).The latter situation may arise if the agent is unsure of the opponents preferences or type and tries to form beliefs letting it?ne tune its behaviour.This point also suggests that,while forming beliefs,it may help for an agent to have a strategy set,allowing it to chose from a range of moves instead of having to modify one strategy for the next move(which may result in loss of information as to how to tackle opponents which may be associated with previously formed beliefs for instance).An ensemble of strategies is an idea not explored with bargaining problems yet and forms one of the ideas to be explored in this research.

Nevertheless,Oliver[70]used a genetic algorithm to learn negotiation strategies in both single and multiple issue bargaining settings.Agent strategies were represented as binary coded strings which encode the threshold(determining whether an o?er should be accepted or not)and a counter o?er(in case of rejection of the o?er by the opponent and if the deadline has still not been reached).Gerding[44]studies a similar model but encodes

21

本文来源:https://www.bwwdw.com/article/uc7l.html

Top