Monday, December 9, 2013

Test the content of two variables from different files in Shell

If you're working on large datasets through Linux Shell, you may sometime need to check whether a variable/column of file1 is in a certain column of file2. For example, file1 has one column with a list of 6,000 SNPs and file2 has five columns with 250,000 SNPs in the second column. You would check whether the 6000 SNPs are part of the 250,000 SNPs. The AWK function built in Shell has a easy way to realize that:

awk 'FNR==NR {a[$1]=$2; next}{print $1 a[$1]}' file2 file1 | wc -l


No comments:

Post a Comment