Distributed Database Concepts बी अनिल नैन 9416077273
Distributed database (DDB) - It is a collection of multiple logically interrelated databases distributed over a computer networks and a distributed database management system (DDBMS) as a software system that manages a distributed database while making tie distribution transparent to the user. A collection of files started at different nodes of network and the maintaining of Inter relationship among them via hyperlinks has become a common organisation on the internet, with file of web pages.
Advantages of Distributed Databases
Distributed database management has been proposed for various reason ranging from organizational, decentralisation and economical processing to greater autonomy. We highlight some of these advantages here.
1. Management of distributed data with different level of transparency: Ideally, a DBMS should be distribution transparent in the sense of hiding the details of where each file (table, relation) is physically stored within the system. Consider the company database that is spread over the network as shown in Figure 6.4.
Communication Network
Chicago (Headquarters)
New York
Atlanta
San Francisco
Los Angles
Employees - San Francisco and Los Angeles
Projects - San Francisco
Works_On - San Francisco Employee
Employees - ALL
Projects - ALL
Works_On - ALL
Employees - New York
Projects - ALL
Works_On - New York Employee
Employees - Atlanta
Projects - Atlanta
Works_On - Atlanta Employee
Employees - Los Angles
Projects - Los Angles and San Francisco
Works_On - Los Angles Employees
Where Employees, Projects and Works_ On are the relation in the Company Database as follows:
EMPLOYEE
FNAME
LANAME
SSN
BDATE
ADDRESS
SEX
SALARY
PROJECT
PNAME
PNUMBER
PLOCATION
DNUM
WORKS_ON
ESSN
PNO
HOURS
· Distributed or network transparency: This refers to freedom for the user from the operational details of the network. It may be divided into location transparency and naming transparency.
Location transparency: It refers to the fact that the command used to perform a task is independent of the location of data and the location of the system where the command was issued.
Naming Transparency: It implies that once a name is specified, the named objects can be accessed unambiguously without additional specification
· Replication transparency: As we show in above figure copies of data may be stored at multiple sites for better availability, performance, and reliability. Replication transparency makes the user unaware of the existence of copies.
· Fragmentation transparency: Two types of fragmentation are possible. Horizontal fragmentation distributes a relation into sets of tuples (rows) and Vertical fragmentation distributes a relation into sub relation where each sub relation is defined by a subset of the column of the original relation. A global query by the user must be transformed into several payment queries. Fragmentation transparency makes the user unaware of the existence of fragments.
1. Increased reliability and availability: Reliability is broadly defined as the probability that a system is running (not down) at a certain point of time, whereas availability is the probability that the system is continuously available during a time interval. When the data and DBMS software are distributed over several sites, one site may fail while other sites continue to operate. Only the data and software that exists at the failed site cannot be accessed. In a centralised system, failure at a single site makes the whole system unavailable to all users.
2. Improved performance: A DDBMS fragments the database by keeping the data closer to where it is needed most. Data localisation reduces the contention for CPU and I/O services and simultaneously reduces access delays involved in Wide Area Network. When a large database is distributed over multiple sites, smaller database exist at each site. As a result, local queries and transactions accessing data at a single site have better performance because of the smaller local databases.
3. Easier Expansion: In a distributed environment, expansion of the system in terms of addition of more data, increasing database size, or addition of more processors is much easier.
Additional Functions of Distributed Databases
Distribution leads to increased complicity in system designed implementation. To achieve the potential advantage by DDBMS as listed previously, software must be able to provide the following functions in addition to those of a centralised DBMS.
Keeping track of data: The ability to keep track of the data distribution, fragmentation, and replication by expanding the DDBMS catalog.
Distributed query processing: The ability to access remote sites and transmit queries and data among the various sites via a communication network.
Distributed transaction management: The ability to devise execution strategies for queries and transactions that access data from more than one site and to synchronise the access to distributed data and maintain integrity of the overall database.
Replicated data management: The ability to decide which copy of a replicated data item to access and to maintain the consistency of copies of replicated data items.
Distributed database recovery: The ability to recover from individual crashes and from new types of failures such as the failure of a communication links.
Security: Distributed transactions must be executed with the proper management of the security of the data and the authorisation/access privileges of users.
Distributed directory (catalog) management: A directory contains information(meta data) about data in the database. The directory may be global for the entire DDB or local for each site. At the physical hardware level, the following main factors distinguish a DDBMS from a centralised system.
There are multiple computers, called sites or nodes.
Distributed database (DDB) - It is a collection of multiple logically interrelated databases distributed over a computer networks and a distributed database management system (DDBMS) as a software system that manages a distributed database while making tie distribution transparent to the user. A collection of files started at different nodes of network and the maintaining of Inter relationship among them via hyperlinks has become a common organisation on the internet, with file of web pages.
Advantages of Distributed Databases
Distributed database management has been proposed for various reason ranging from organizational, decentralisation and economical processing to greater autonomy. We highlight some of these advantages here.
1. Management of distributed data with different level of transparency: Ideally, a DBMS should be distribution transparent in the sense of hiding the details of where each file (table, relation) is physically stored within the system. Consider the company database that is spread over the network as shown in Figure 6.4.
Communication Network
Chicago (Headquarters)
New York
Atlanta
San Francisco
Los Angles
Employees - San Francisco and Los Angeles
Projects - San Francisco
Works_On - San Francisco Employee
Employees - ALL
Projects - ALL
Works_On - ALL
Employees - New York
Projects - ALL
Works_On - New York Employee
Employees - Atlanta
Projects - Atlanta
Works_On - Atlanta Employee
Employees - Los Angles
Projects - Los Angles and San Francisco
Works_On - Los Angles Employees
Where Employees, Projects and Works_ On are the relation in the Company Database as follows:
EMPLOYEE
FNAME
LANAME
SSN
BDATE
ADDRESS
SEX
SALARY
PROJECT
PNAME
PNUMBER
PLOCATION
DNUM
WORKS_ON
ESSN
PNO
HOURS
· Distributed or network transparency: This refers to freedom for the user from the operational details of the network. It may be divided into location transparency and naming transparency.
Location transparency: It refers to the fact that the command used to perform a task is independent of the location of data and the location of the system where the command was issued.
Naming Transparency: It implies that once a name is specified, the named objects can be accessed unambiguously without additional specification
· Replication transparency: As we show in above figure copies of data may be stored at multiple sites for better availability, performance, and reliability. Replication transparency makes the user unaware of the existence of copies.
· Fragmentation transparency: Two types of fragmentation are possible. Horizontal fragmentation distributes a relation into sets of tuples (rows) and Vertical fragmentation distributes a relation into sub relation where each sub relation is defined by a subset of the column of the original relation. A global query by the user must be transformed into several payment queries. Fragmentation transparency makes the user unaware of the existence of fragments.
1. Increased reliability and availability: Reliability is broadly defined as the probability that a system is running (not down) at a certain point of time, whereas availability is the probability that the system is continuously available during a time interval. When the data and DBMS software are distributed over several sites, one site may fail while other sites continue to operate. Only the data and software that exists at the failed site cannot be accessed. In a centralised system, failure at a single site makes the whole system unavailable to all users.
2. Improved performance: A DDBMS fragments the database by keeping the data closer to where it is needed most. Data localisation reduces the contention for CPU and I/O services and simultaneously reduces access delays involved in Wide Area Network. When a large database is distributed over multiple sites, smaller database exist at each site. As a result, local queries and transactions accessing data at a single site have better performance because of the smaller local databases.
3. Easier Expansion: In a distributed environment, expansion of the system in terms of addition of more data, increasing database size, or addition of more processors is much easier.
Additional Functions of Distributed Databases
Distribution leads to increased complicity in system designed implementation. To achieve the potential advantage by DDBMS as listed previously, software must be able to provide the following functions in addition to those of a centralised DBMS.
Keeping track of data: The ability to keep track of the data distribution, fragmentation, and replication by expanding the DDBMS catalog.
Distributed query processing: The ability to access remote sites and transmit queries and data among the various sites via a communication network.
Distributed transaction management: The ability to devise execution strategies for queries and transactions that access data from more than one site and to synchronise the access to distributed data and maintain integrity of the overall database.
Replicated data management: The ability to decide which copy of a replicated data item to access and to maintain the consistency of copies of replicated data items.
Distributed database recovery: The ability to recover from individual crashes and from new types of failures such as the failure of a communication links.
Security: Distributed transactions must be executed with the proper management of the security of the data and the authorisation/access privileges of users.
Distributed directory (catalog) management: A directory contains information(meta data) about data in the database. The directory may be global for the entire DDB or local for each site. At the physical hardware level, the following main factors distinguish a DDBMS from a centralised system.
There are multiple computers, called sites or nodes.