Q. Is copying(from one server to another) a data of 1MB in size file is same as copying of multiple files whose size is 1MB when added up?
This is a tricky question asked in of the interview which I have attended recently. Initially I thought both the things will happen at same speed. But it’s a blunder mistake I did. Actually Copying a file with 1Mb size will have far better speeds than copying 100’s of files whose combined size 1MB if we have same network conditions.
Lets analyze this where I went wrong.
Transport layer which do the packaging of data and send that to remote server which do not understand the file types, number of files etc. So networking is not responsible for our file transfer delay. Then where is the delay coming when transferring smaller files?
Here is what I made a mistake. I consider only networking part. This copying of data depends on two things one is networking the other one is OS part. Though copying of files will not depend on networking from our above understanding, it will depend on how data is written onto disks. Lets take an example of copying a single file from OS prospective.
Once the data arrives from network, our OS will try to analyze the packets and extract the file from network packets. It will keep all the data chunks of a single file in it’s buffer until it receives complete file. once it received all parts of a file, it will assemble into a single file and write on to data by using a file descriptor on to hard disk This include data writing/reading on to the disk, opening and closing file handler once the file is written onto disk and creating a unique inode number for it.
Now let us come to our example of copying large number of files. Lots of small files means you keep having to open/close file handles, maybe seek on the disk and even creating unique inode for each file it created. And your OS keeps writing data of one file at a time. This is the culprit of slower data transmission for smaller files though we have enough bandwidth.
We tested this today in our lab by using SCP for multiple files and a single file with same size. The files with size 1k each transferred very slow(in bytes) and where as single file with 1MB size transferred very fast(in MB’s).
Lesson learnt: Understand concepts first before configuring servers.
Latest posts by Surendra Anne (see all)
- Review: Whizlabs Practice Tests for AWS Certified Solutions Architect Professional (CSAP) - August 27, 2018
- How to use ohai/chef-shell to get node attributes - July 19, 2018
- wget download a file to a directory in Linux/Unix - June 4, 2018
- GIT: How to compare two GIT branches? - June 3, 2018
- Online training on Linux Bash shell scripting - February 8, 2018