Your showtime PLINK tutorial

Learning outcomes: At the end of this chapter, you will be able to modify genotype information formats with PLINK.

In the previous posts, you read almost the full general suggestions for the work environment, downloaded the PLINK software, and genotype data for a surprisingly big number of animals. But the program was non executed yet in any meaningful way... But now everything will change and y'all will finally run the PLINK programme. Your goal will be to transform the binary file format saved as bim, bed, and fam files to a text-based genotype format saved every bit ped and map files. This exercise will also give y'all a detailed clarification of the use of PLINK. You volition see that it is not difficult at all. Frankly, I look at the program as some kind of building-cake game. You need to know what you want to achieve and all you need to practice is add the correct elements to it. The start matter y'all need to write downwardly is some sort of base structure. In fact, you can first with this very same line all the fourth dimension and add elements to information technology.

But earlier you begin...

...I know, I know I am abrasive with these recommendations all the fourth dimension, and you are eager to jump in... But hear me out... Y'all can type the PLINK commands directly to the command line, but don't do that. You lot volition see that there volition exist many mistakes and re-runs all the time, and this way you will need to re-type all the time. This is a huge loss of time. You don't want that. Open a new text file instead and write your plan script in that location. Ideally, this text file is saved in a cloud storage directory, then it is being automatically backupped upon relieve. Call back: the script files are very pocket-size in size, merely extremely valuable given the corporeality of time yous invested in writing them.

...1 more piece of advice, in case you are new to scripting and programming. You lot might be surprised, but the scripts you write should be readable by the computer, but perhaps fifty-fifty more importantly by people, including future y'all. Allow me explain... You write whatever script today and you sort of know what it does. I guarantee y'all that if you lot come up back to it even after a week, you will have to spend quite some time figuring out what it does. Not to mention if you wrote some stuff like two years ago... Or imagine that y'all have to send this script to your colleague, who was not involved in the writing at all! If it is not clear how to change fifty-fifty bones things like input file names or locations, you lot are merely looking for bug. And then only document your code using plain words what some crucial lines or sections do. I use the # hashtag sign at the beginning of each comment line to indicate it equally such. Also, lines starting with a # are ignored in many programs, so exercise non cause general errors.

Long story short: Document your code!

So now yous will run PLINK. For real this time... Open the command prompt in a folder where you have the plink executable file and the genotype information, as described before in the PLINK - Software for genomic analyses chapter. Open up a new text file and re-create the post-obit lines in there:

                          # Change binary genotype to ped+map format              plink              --bfile ADAPTmap_genotypeTOP_20160222_full              --cow              --nonfounders              --let-no-sexual activity              --recode              --out ADAPTmap_TOP          

Relieve the text file. From now on any change you implement volition be written to the text file commencement, then you can adapt easily in instance of need. Copy the whole plink line to the command prompt (without the annotate line) and press enter. You take to accept 1Gb complimentary space for the recoded file. If everything went well, you lot will encounter this:

A successful PLINK run

Effigy seven.1: A successful PLINK run

In the following section I volition explicate what yous just did in ii parts:

Start, allow's start with the PLINK options. I will list them, simultaneously providing a link to them on the PLINK website. I will also tell yous how can you (hands?) discover answers for any PLINK option.

2nd, I volition talk about the resulting ped and map files, including their structure.

The ped and map file format

In this department, we will accept a brief wait at the newly created files and tell something nearly their structure.

You might have noticed that there are a few new files created in the same directory you accept run the program. From these files, the ones with file extension .ped and .map are the most important.

The .map file is very similar to the previously described .bim file, only without the concluding two columns with genotypes.

The .ped file structure is essentially the chain of the .fam file (one line per individual), followed by human-readable genotypes in text format. Every ii columns stand for one SNP in a infinite-delimited format.

To open up the .map file should be no problem. The .ped file however is nearly 1Gb in size! I got the "File as well large to be opened" error message with Notepad++, but the TextPad opened it without problems (subsequently a bit of waiting fourth dimension, might be auto-dependent, you lot will see a small progress bar lesser left).

Exercise

Phewww... Y'all made information technology to the end of this unexpectedly long clarification! Congratulations! But to really bulldoze domicile the message and lock the noesis in your memory, I have a small job for you.

You see, the PLINK file formats are actually popular, but at that place are many others out in that location. The good news is, that you tin use PLINK to transform files to other popular formats. 1 of them is undoubtedly the so-chosen variant telephone call format that is the standard output file from whole-genome sequencing pipelines, and a possible input to some other programs. So your task is to change the ADAPTmap file to vcf file format.

Hint: if I were you lot, I would explore the diverse options of the --recode choice on the website. wink-flash

As always, you can compare your solution to the one on YouTube. The video also contains some bonus information on related problems you might face during analyses, then brand sure to check it out regardless.

If the embedded video does not get-go, click it over again to "Watch on YouTube". Direct link: https://world wide web.youtube.com/spotter?v=c1LSFiv9CxY